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PROJECT  SUMMARY 


How  do  listeners  determine  whether  two  tonal  sequences  have 
the  same  or  different  temporal  patterns?  Three  studies  of 
temporal  pattern  discrimination  were  conducted..  Listener 
performance  in  these  experiments  was  evaluated  using  a 
mathematical  model  of  temporal  pattern  discrimination.  Analyses 
of  these  experiments  allow  specification  of  the  temporal  pattern 
discrimination  mechanisms  employed  by  the  human  auditory  system. 

The  first  study  consisted  of  experiments  that  tested  how 
listeners  discriminate  between  arrhythmic,  tonal  sequences 
approximately  one  half-second  in  duration.  Performance  was 
modeled  by  the  Pattern  Correlation  Model.  According  to  the 
model,  the  listener  extracts  a  list  of  marker  interonset  times 
from  each  pattern,  and  then  computes  the  correlation  between  the 
pattern  of  time  intervals  marked  by  the  tones  in  each  sequence; 
other  information  about  the  input  waveforms  (such  as  absolute 
timing  or  spectra)  is  discarded.  The  experiments  tested  how 
listener  performance  depended  on  basic  parameters  of  the  task, 
such  as  sequence  correlation,  and  number,  duration,  and 
variability  of  pattern  elements .  Listener  performance  was 
consistent  with  the  predictions  of  the  Pattern  Correlation  model, 
but  was  limited  by  an  internal  time  jitter  or  noise  that  was  a 
function  of  the  average  intermarker  interval. 

The  second  study  evaluated  the  human  listener's  ability  to 
discriminate  between  word-length  tonal  sequences  that  were 
sub j ected  to  uniform  temporal  transformations,  such  as  time 
compression  and  expansion.  One  of  the  most  intriguing  features 
of  temporal  pattern  perception  is  the  ability  to  recognize 
patterns  as  similar,  despite  such  compression  or  expansion 
manipulations.  Examples  of  such  time  normalization  abound  in 
speech  and  music  perception  and  we  are  normally  unaware  of  such 
temporal  changes,  even  when  they  occur  during  relatively  brief 
stimuli,  such  as  words.  The  experiments  in  the  second  study  also 
tested  how  well  the  Pattern  Correlation  model  could  predict  the 
effects  of  time  compression  and  expansion  on  listener 
performance.  The  model  proved  useful  in  describing  performance 
in  a  variety  of  different  conditions  that  employed  multiplicative 
and  additive  time  transformations.  Listener  performance  dropped 
when  one  of  the  sequences  was  compressed  or  expanded  in  time.  In 
order  for  the  model  to  describe  this  performance,  it  was 
necessary  to  postulate  an  additional,  internal  noise  component 
that  was  proportional  to  the  magnitude  of  the  difference  between 
the  sequence  transformations. 

The  purpose  of  the  third  study  was  to  evaluate  the 
possibility  that  different  pattern  comparison  mechanisms  operate 
under  different  task  conditions.  The  experiments  evaluated 
discrimination  when  the  two  patterns  began  at  delayed  starting 
times.  The  patterns  were  presented  at  different  frequencies  and 
to  different  ears,  and  were  subjected  to  multiplicative 
compressions  and  expansions.  Listeners  performed  well  even  when 
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the  patterns  contained  tones  of  different  frequency  and  in  spite 
of  the  patterns  being  presented  to  separate  earphone  channels. 

Performance  was  good  when  the  sequences  were  presented 
either  (near)  simultaneously  or  at  relatively  long  time  delays. 
When  the  time  between  pattern  onsets  was  less  than  10-ms, 
discrimination  was  very  sensitive  to  the  expansion  or  compression 
manipulation,  indicating  that  discrimination  in  this  region  was 
based  on  the  process  of  waveform  correlation.  At  longer  time 
separations,  performance  was  relatively  insensitive  to  such 
transformations,  consistent  with  the  Pattern  Correlation 
hypothesis.  Thus  the  results  support  a  two-phase  mechanism:  when 
the  sequence  delay  is  less  them  20-ms,  th^  binaural  waveform 
correlator  is  the  active  mechanism;  when  the  sequence  delay  is 
greater  them  20 -ms,  the  pattern  correlator  is  the  active 
mechemism.  Morever,  the  efficiency  of  the  pattern  correlation 
mechanism  is  very  poor  when  the  sequences  overlap  in  time.  It 
appears  that  the  sequential  presentation  of  stimulus  patterns  may 
be  a  requirement  for  the  pattern  correlator  to  function. 
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i .  Perception  of  temporal  patterns  defined  by  tonal  sequences 


RobertD.  Sorkin 

Dependent  of  Psychology.  University  of  Florida,  Gainesville.  Florida  3261 1 

(Received  23  June  1989;  accepted  for  publication  6  November  1989) 

This  experiment  tested  how  listeners  discriminate  between  the  temporal  patterns  defined  by 
two  sequences  of  tones.  Two  arrhythmic  sequences  of  n  tones  were  played  successively  (n  =  8, 
12,  or  16,  tone  duration  =  35  ms,  frequency  =  1000  Hz),  and  the  listener  reported  whether  the 
sequences  had  the  same  or  different  temporal  patterns.  In  the  first  sequence,  the  durations  of 
the  intertone  gaps  were  chosen  at  random;  in  the  second  sequence,  the  gaps  were  either  (a)  the 
same  as  the  first  sequence  or  (b)  chosen  at  random.  Discrimination  performance  increased 
with  the  variability  of  the  gap  sequences  and  decreased  with  the  size  of  the  correladon  between 
the  sequences.  A  discrimination  model  based  on  computation  of  the  sample  correlation 
between  the  sequences  of  gaps,  but  limited  by  an  internal  variability  of  approximately  IS  ms, 
described  observer  performance  in  a  variety  of  conditions. 


PACS  numbers:  43.66.Mk,  43.66.Ba  [WAY] 


INTRODUCTION 

How  do  listeners  discriminate  between  the  temporal 
patterns  defined  by  two  tonal  sequences?  The  answer  to  this 
question  may  be  relevant  to  important  issues  in  the  percep¬ 
tion  of  speech  and  musical  patterns.  We  report  on  some  ex¬ 
periments  and  propose  a  model  for  describing  behavior  in 
tasks  in  which  a  listener  must  decide  whether  two  arrhyth¬ 
mic,  equitone  sequences  have  the  same  or  different  temporal 
patterns. 

Several  investigators  have  studied  the  perception  of  par¬ 
tially  unstructured  or  arrhythmic  temporal  sequences.  Lun- 
ney  ( 1974)  showed  that  the  discrimination  of  irregularity  in 
tempo,  introduced  into  the  fourth  click  of  the  output  of  a 
metronome,  was  an  exponential  function  of  the  period,  in  a 
range  of  period  durations  from  30-3200  ms.  Pollack  studied 
the  perception  of  temporal  gaps  within  trains  of  very  brief 
pulses  (Pollack,  1967,  1968a)  and  the  perception  of  period¬ 
icity  and  jitter  in  pulse  trains  (Pollack,  1968b,  c.  d).  Pollack 
found  that  the  threshold  for  gap  discrimination  increased 
with  the  interpulse  interval,  for  interpulse  intervals  greater 
than  10  ms.  In  general,  performance  was  best  when  the  pulse 
trains  contained  large  numbers  of  intervals  and  had  very 
short  interpulse  intervals.  Pollack  suggested  that  the  pro¬ 
cessing  of  trains  with  very  short  interpulse  intervals  prob¬ 
ably  involved  a  spectral  mode  of  processing,  while  long  inter¬ 
pulse  intervals  ( >  10  ms)  probably  required  a  temporal 
processing  mode. 

Sorkin  et  al.  ( 1982)  studied  the  perception  of  tone  se¬ 
quences  with  randomly  jittered  temporal  patterns.  Their 
subjects  heard  two  sequences  of  n  tones:  One  sequence  had  a 
fixed  intertone  interval  and  the  other  had  jitter  added  to  the 
intertone  intervals.  Subjects  had  to  detect  which  sequence 
had  the  added  jitter.  Sorkin  et  al.  found  that  discrimination 
improved  with  the  number  of  intervals  and  decreased  with 
the  average  duration  of  the  intervals  (the  durations  ranged 
from  20-1 10  ms).  Their  results  were  consistent  with  tempo¬ 
ral  discrimination  data  employing  single,  marked  time  inter¬ 
vals  (Creelman,  1962;  Getty,  1975;  Divenyi  and  Danner, 
1977;  Divenyi  and  Sachs,  1978;  and  Allen,  1979). 


Sorkin  etal.  ( 1982)  proposed  a  statistical  model  of  jitter 
detection,  in  which  the  timing  of  different  frequency  tones 
was  monitored  (and  compared)  across  separate  critical 
band  channels;  discrimination  of  time  jitter  within  a  critical 
band  channel  was  much  better  than  across  channels.  Perfor¬ 
mance  increased  in  the  expected  way  with  the  number  of 
tones  in  each  sequence  and  with  the  different  regular  fre¬ 
quency  patterns  employed.  However,  when  the  frequency 
patterns  were  random,  listener  performance  was  very  much 
below  the  model’s  predictions. 

In  a  similar  experiment,  Halpem  and  Darwin  ( 1982) 
presented  subjects  with  a  sequence  of  four  clicks  which 
marked  three  intervals;  their  subjects  had  to  indicate 
whether  the  last  interval  was  shorter  or  longer  than  the  pre¬ 
ceding  two.  Halpem  and  Darwin  tested  base  durations  rang¬ 
ing  from  400-1450  ms.  Discrimination  performance,  as 
measured  by  the  standard  deviation  of  the  resulting  psycho¬ 
metric  functions,  was  an  increasing  function  of  the  base  du¬ 
ration;  the  resulting  Weber  fraction  was  about  0.05.  consis¬ 
tent  with  that  reported  by  Getty  (  1975). 

Recently,  Schulze  (1989)  reported  a  variation  of  the 
Halpem  and  Darwin  experiment  in  which  subjects  were 
asked  to  report  whether  the  last  of  n  intervals  marked  by 
tones  was  longer  or  the  same  as  the  n  —  1  preceding  inter¬ 
vals.  Schulze  used  base  durations  of  from  50  to  400  ms  and 
from  two  to  six  intervals  in  each  sequence.  Schulze  tested  an 
hypothesis  similar  to  that  of  the  Sorkin  et  al.  ( 1982)  model 
about  the  expected  improvement  in  discriminability  with 
number  of  intervals.  Discrimination  improved  with  the 
number  of  intervals,  for  most  of  the  subjects.  Schulze  failed 
to  find  evidence  for  a  Weber’s  law  effect;  for  his  subjects,  the 
discrimination  limen  was  between  5  and  15  ms  and  indepen¬ 
dent  of  the  base  duration. 

In  the  present  experiment  the  listener  was  asked  to  com¬ 
pare  two  arryhthmic  tonal  sequences  and  report  whether  the 
temporal  patterns  were  the  same  or  different.  The  two  se¬ 
quences  were  either  identical  or  had  partially  correlated 
temporal  envelopes.  This  task  is  a  generalization  of  the  Sor¬ 
kin  etal.  ( 1982)  jitter-detection  paradigm.  An  advantage  of 
these  paradigms  is  that  the  information  carrying  aspects  of 
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the  sequences  are  distributed  throughout  the  sequence,  rath¬ 
er  than  concentrated  on  one  judged  interval  as  in  the  Hal- 
pem  and  Darwin  ( 1982)  and  Schulze  ( 1989)  experiments 
The  goal  of  the  present  experiment  was  to  test  whether  a 
listener’s  ability  to  perform  sequence  comparison  can  be  de¬ 
scribed  by  a  process  in  which  the  listener  computes  the  cor¬ 
relation  between  the  sequence  temporal  envelopes. 

I.  METHOD 

Listeners  compared  pairs  of  tone  sequences  composed 
of  n  1000- Hz  tone  bursts  of  33-ms  duration  and  approxi¬ 
mately  7 1-dB  sound-pressure  level.  Tone  bursts  were  shaped 
by  a  4-ms  linear  rise  and  decay  envelope.  After  listening  to 
the  pair  of  tone  sequences  presented  on  each  experimental 
trial,  the  subject  had  to  respond  whether  or  not  the  temporal 
pattern  of  tones  was  the  same  or  different.  There  were  two 
types  of  experimental  trials:  trials  on  which  the  identical  se¬ 
quence  of  tones  and  intertone  intervals  (gaps)  were  present¬ 
ed  ( SAME  trials)  and  trials  on  which  the  pattern  of  inter- 
tone  gaps  was  different  in  the  two  presented  sequences 
(DIFFERENT  trials).  On  trials  when  the  sequences  were 
different,  the  only  difference  between  the  sequences  was  in 
the  pattern  of  intertone  gaps  and  tone  onsets.  The  first  part  of 
Fig.  1  illustrates  a  SAME  trial;  the  second  part  illustrates  a 
DIFFERENT  trial.  The  type  of  trial  was  chosen  at  random, 
with  p(SAME)  =  0.5. 

The  intertone  gaps  were  generated  by  a  process  that  en¬ 
abled  the  experimenter  to  control  the  mean  gap  duration 
the  standard  deviation  of  the  gaps,  cr^v,  and  the  corre¬ 
lation  p„  between  the  two  gap  sequences  on  trials  when  the 
sequences  were  different.  The  intertone  gaps  were  construct¬ 
ed  by  combining  three  independently  generated  normal  de¬ 
viates,  with  one  deviate  common  to  the  two  sequences  (see 
Appendix).  Gap  durations  of  less  than  2  ms  were  not  al¬ 
lowed.  The  sequence  correlation  is  given  by  the  ratio  of  two 
variances,  the  variance  common  to  the  two  sequences  divid¬ 
ed  by  the  sum  of  the  common  and  unique  variances  ( Jeffress 
and  Robinson,  1962): 


FIG.  1 .  The  envelopes  of  typical  tone  sequences  are  shown  for  ( a )  same  and 
(b)  different  trials. 


*here  com  and  un  refer,  respectively,  to  the  common  and 
unique  portions. 

One  male  and  three  female  undergraduate  students  at  * 
the  University  of  Florida  served  as  observers;  they  were  paid 
an  hourly  wage  plus  an  incentive  for  correct  responses.  Lis¬ 
teners  had  normal  hearing  and  performed  the  tasks  for  ap¬ 
proximately  2  h  per  day,  3  days  per  week.  Listeners  were 
seated  in  a  dOuble-walled  acoustically  insulated  chamber. 
The  stimuli  were  presented  monaurally  via  TDH-39  head¬ 
phones.  The  conditions  were  tested  in  100  trial  blocks;  typi¬ 
cally,  8  blocks  were  completed  in  a  session.  The  two  se¬ 
quences  to  be  discriminated  on  each  trial  were  presented 
with  a  750-ms  intersequence  separation;  full  feedback  about 
the  correct  response  was  provided  after  each  trial. 


II.  CORRELATION  MODEL 

A  straightforward  model  of  observer  performance  in  the 
temporal  pattern  discrimination  task  follows  from  the  as¬ 
sumption  that  the  observer  computes  the  correlation 
between  the  two  sequences  of  gaps  presented  on  each  trial 
Suppose  that  the  observer’s  response  is  based  on  the  value  of 
the  Pearson  product-moment  correlation  coefficient  statistic 
r,2  computed  on  the  sample  of  intertone  gaps  defined  by  the 
pair  of  sequences  and  (t^  .  A 

transformation  of  the  correlation  coefficient,  known  as  the 
Fisher  r-to-Z  transformation,  is  defined  as 

Z  =  iln[(l+r,2)/(l-rI2)].  (3) 

.  The  sampling  distribution  of  Z  is  distributed  approxi¬ 
mately  normally,  for  gaps  drawn  from  a  normal  distribution 
and  for  n  of  at  least  moderate  size  ( it  s  10) .  Ifp  is  the  popula¬ 
tion  correlation  coefficient,  the  mean  and  standard  deviation 


of  Z are  then  given  by  (Brunk,  1960) 

lnf+','U  ' 

2  \l-pJ  2n  —  1 

(4) 

and 

az3i{n  —  3)-,/: . 

(5) 

Discrimination  performance  can  be  obtained  from  the 
normalized  difference  between  the  means  of  the  Z  statistic, 
given  the  possible  hypotheses  on  a  trial:  SAME  when  p  =  1.0 
and  DIFFERENT  when  p  =  p„ .  The  discriminability  d '  is 
given  by  the  difference  between  the  means  of  the  Z  statistic 
divided  by  the  standard  deviation  of  Z.  [The  contribution  of 
the  right-hand  term  of  Eq.  (4)  is  very  small.] 

For  a  human  observer,  the  effective  correlation  between 
the  sequences  on  DIFFERENT  trials  will  depend  on  ptx , 
< 7 ,  and  the  magnitude  of  internal  variability  in  the  observ¬ 
er’s  encoding  and  storage  of  the  gaps.  We  assume  that  the 
observer’s  observation  of  the  gaps  is  subject  to  a  temporal 
jitter  ofn ,  and  that  this  jitter  is  uncorrelated  across  the  gap 
sequences.  Adding  this  uncorrelated  jitter  af„  to  Eqs.  ( 1 ) 
and  (2),  yields 


P  DIFF  — 


+  °in  +  <7?n  1  +  (ojo* 


and  from  Eqs.  ( 1 )  and  ( 2 )  and  p  =  1.0,  the  effective  correla¬ 
tion  on  SAME  trials, 
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PSAME  =  [1  +  (^n/^g,p)2]  '■  (7) 

The  magnitude  of  the  internal  temporal  jitter  crm  is  the  single 
parameter  of  the  model.  Because  the  internal  jitter  is  inde¬ 
pendent  between  the  two  sequences,  it  acts  to  reduce  the 
effective  correlation  of  the  sequences. 

Discrimination  performance  can  be  calculated  using 
Eqs.  (4),  (6), and  (7)  to  compute  the  difference  between  the 
means  of  the  Z  statistic  on  DIFFERENT  and  SAME  trials 
divided  by  the  standard  deviation  of  Z: 

_1_  inf 1  ~*~^>SAME  ^ 

2  \1  —  Psame  /  2n  —  1 

1  1  +Pdiff  ^  _  Pdipf  | 

2  \  1  -  Pdiff  /  2/t  —  11 

X(n-3)-,n.  (8) 

111.  EXPERIMENT  1:  EFFECT  OF  SEQUENCE 
CORRELATION  AND  VARIABILITY 

The  purpose  of  this  experiment  was  to  examine  how 
discrimination  performance  depended  on  the  correlation 
between  the  sequences  p„  (as  specified  on  DIFFERENT 
trials,  since p  =  1  on  SAME  trials)  and  the  standard  devi¬ 
ation  of  the  intertone  gaps  and  to  estimate  the  magni¬ 

tude  of  the  internal  noise  <rlB . 

A.  Procedure 

Observers  were  run  in  conditions  using  a  range  of  differ¬ 
ent  gap  sequence  correlations  (from  0  to  0.8)  at  a  fixed-gap 
standard  deviation  of  20  ms  (experiment  la),  and  then  at 
gap  standard  deviations  of  10,  20,  30,  and  40  ms  at  a  gap 
correlation  of  0.6  (experiment  lb).  The  gap  correlation  and 
gap  standard  deviation  were  fixed  within  a  block  of  100  tri¬ 
als.  The  conditions  were  run  in  sequences  of  blocks  having 
different  gap  correlations  and  a  fixed-gap  standard  deviation 
or  in  sequences  of  blocks  having  different  gap  standard  de¬ 
viations  and  a  fixed  gap  correlation.  Table  I  summarizes  the 
values  for  the  different  variables  in  the  experiment.  The  or¬ 
der  of  gap  correlation  or  gap  standard  deviation  was  rando¬ 
mized  over  the  sequence  of  blocks.  Listeners  ran  approxi¬ 
mately  9000  trials  before  data  collection  was  begun;  no 
effects  of  practice  were  evident  after  this  training  period.  The 


TABLE  I.  Summary  of  conditions  and  variables  for  the  pattern  discrimina¬ 
tion  experiments.  (All  durations  in  milliseconds.) 
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data  indicated  no  strong  response  biases  and  no  apparent 
relationship  between  the  listeners’  criteria  and  the  condi¬ 
tions  run.  Sorkin  (1962)  extended  detection  theory  to  the 
same-different  paradigm  and  considered  some  of  the  meth¬ 
odological  questions  involved. 

B.  Results  and  discussion 

Figure  2(a)-(d)  shows  the  data  from  four  observers  at 
a  mean  gap  duration  of  30  ms  and  a  gap  standard  deviation  of 
20  ms.  Figure  3  shows  the  data  averaged  over  the  four  ob¬ 
servers  at  a  gap  mean  of  50  ms.  The  vertical  bars  in  the 
figures  indicate  plus  and  minus  one  standard  error  of  the 
mean;  in  Fig.  3,  these  are  the  average  of  the  standard  errors 
for  the  four  listeners  in  each  condition.  The  solid  lines  in  Fig. 
2  are  least-squares  fits  of  the  model  to  each  observer’s  aver¬ 
age  data;  the  value  of  the  internal  jitter  parameter  is  shown  in 
each  section  of  the  figure.  In  Fig.  3,  the  model  is  fit  to  the 
average  data. 

The  observed  drop  in  listener  performance  with  in¬ 
creases  in  the  correlation  of  the  sequences  is  consistent  with 
the  model.  Discrimination  performance  should  drop  as  the 
sequence  correlation  is  increased,  since  the  magnitude  of  any 
observable  differences  between  the  sequences  must  decrease 
as  their  temporal  envelopes  become  more  highly  correlated. 
The  value  of  the  (single)  internal  temporal  jitter  parameter 
was  14.73  ms,  for  the  fit  of  the  model  to  the  average  data 
from  the  four  listeners.  This  value  for  the  internal  jitter  is  at 
the  high  end  of  the  range  of  values  obtained  in  duration  dis¬ 
crimination  experiments  employing  single  and  multiple 
judged  intervals  (Lunney,  1974;  Getty,  1975;  Divenyi  and 
Danner,  1977;  Haipem  and  Darwin,  1982;  Scrkin  et  al., 
1982;  and  Schulze,  1989).  This  value  will  be  used  for  all 
subsequent  fits  of  the  model. 

Figure  4  shows  how  average  performance  depended  on 
the  standard  deviation  of  the  gap  duration.  The  vertical  bars 
indicate  plus  and  minus  one  standard  error  of  the  mean;  the 
average  standard  errors  for  the  four  observers  are  shown  for 
each  condition.  The  solid  line  is  the  prediction  of  the  correla¬ 
tion  model,  using  the  value  of  the  internal  jitter  ( based  on  the 
average  data)  of  Fig.  3.  According  to  the  model,  as  the  level 
of  external  variability  in  the  gaps  increases,  the  contribution 
of  internal  and  (assumed)  uncorrelated  variability  is  re¬ 
duced,  and  performance  should  improve.  It  is  apparent  that 
the  model  overestimates  performance  at  high  standard  de¬ 
viations  of  the  gap. 

IV.  EXPERIMENT  2:  EFFECT  OF  GAP  DURATION  AND 
NUMBER 

The  purpose  of  the  second  experiment  was  to  examine 
how  discrimination  performance  depended  on  the  mean  gap 
duration  and  on  the  number  of  intertone  gaps,  n,  and  to 
compare  these  observations  to  the  predictions  of  the  model. 

A.  Procedure 

Listeners  ran  two  conditions  in  which  the  gap  sequence 
correlation  was  fixed  at  0.35,  the  gap  standard  deviation  was 
fixed  at  20  ms,  and  the  mean  and  number  of  intertone  gaps 
were  varied.  As  in  experiment  1,  the  gap  sequence  correla- 
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FIG.  3.  The  average  performance  of  four  observers  (d')  is  plotted  as  a  func¬ 
tion  of  the  sequence  correction.  The  solid  line  is  the  prediction  of  the  corre¬ 
lation  model  with  an  internal  noise  of  14.73  ms. 
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FIG.  4.  Theaverage  performance  of  four  observers  (d')  is  plotted  as  a  func¬ 
tion  of  the  standard  deviation  of  the  gaps.  The  solid  line  is  the  prediction  of 
the  correlation  model  with  an  internal  noise  of  14.75  ms. 
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tion,  gap  standard  deviation,  mean  gap,  and  number  of  gaps 
were  fixed  within  a  block  of  100  trials.  The  observers  were 
run  in  sequences  of  blocks  of  fixed  mean  gap  duration  (or 
fixed  gap  and  number);  the  order  of  conditions  was  rando¬ 
mized  over  blocks.  Table  I  summarizes  the  experimental 
conditions.  In  experiment  2a,  the  mean  gap  condition,  the 
number  of  gaps  was  fixed  at  1 1  and  the  mean  gap  was  either 
19, 50,  or  8 1  ms.  In  experiment  2b,  the  number  of  gaps  condi¬ 
tion,  there  were  three  gap-number-mean-gap  pairings:  7 
gaps  with  a  mean  of  81  ms,  11  gaps  with  a  mean  of39  ms,  and 
15  gaps  with  a  mean  of  19  ms.  These  values  were  chosen  so 
that  the  total  duration  of  the  gap  sequence  would  be  fixed  at 
approximately  8S0  ms.  The  values  of  n  and  were  chosen 
to  allow  testing  of  a  range  of  gap  durations,  subject  to  the 
constraint  of  avoiding  excessively  long  stimulus  sequences. 

EL  Results  and  discussion 

Figure  5  shows  the  average  performance  in  the  mean 
gap  condition  as  a  function  of  the  magnitude  of  the  mean 
gap.  As  the  mean  gap  was  increased,  observer  performance 
decreased  at  an  increasing  rate.  The  model,  as  defined  by 
Eqs.  (6)— (8),  made  no  assumption  about  the  dependence  of 
performance  upon  /<w.  However,  it  is  reasonable  to  expect 
that  a  Weber’s  law  relationship  would  hold,  such  that  the 
magnitude  of  the  internal  jitter  am  would  increase  with  the 
duration  of  the  intervals  to  be  judged.  Such  a  relationship, 
where  <rM  increases  in  proportion  to/i^,  has  been  found  by 
Lunney  ( 1 974 ) ,  Getty  ( 1975 ) ,  Divenyi  and  Danner  ( 1 977 ) , 
Halpern  and  Darwin  ( 1982),  and  Sorkin  et  al.,  ( 1982). 

In  order  to  quantify  the  contribution  of  a  Weber’s  law 
dependence  of  performance  on  gap  duration  in  the  present 
experiment,  we  set  the  internal  jitter  equal  to  a  linear  func¬ 
tion  of  the  mean  gap  duration: 

or,„=A+  Bpi w  ,  (9) 

where  A  and  B  are  constants.  To  estimate  the  parameters  of 
the  function,  we  reexamined  the  jitter  discrimination  data 
reported  in  our  earlier  study  of  sequence  discrimination 
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FIG.  5.  The  average  performance  of  four  observers  ( d ' )  is  plotted  as  a  func¬ 
tion  of  the  mean  gap  duration.  The  solid  line  is  the  prediction  of  the  correla¬ 
tion  mode!  revised  to  incorporate  the  effect  of  mean  gap  (see  text). 


(Sorkin  et  a/.,  1982).  In  that  study,  listeners  had  to  detect 
the  presence  of  jitter  added  to  equitone  or  binary  tone  se¬ 
quences.  That  is,  let 

»  1.0  =  A  +  Bfiut>  .  ( 10) 

where  crd  _  ,  0  is  the  standard  deviation  of  the  jitter  discri- 
minableatad'  =  l.O.Thevalueof/lintheSorkinefa/.study 
varied  depending  on  the  type  of  sequences  tested.  However, 
the  slope  B  was  relatively  constant,  at  least  for  the  equitone 
and  alternating  tone  conditions.  The  slope  was  approximate¬ 
ly  0.04  and  0.07  for  subject  P  and  S,  respectively  (see  Figs.  6 
and  7  in  Sorkin  et  al.,  1982,  for  the  equitone  and  alternating 
tone  conditions,  n  —  10,  and  mean  durations  of  20-1 10  ms) . 
For  the  current  purpose,  we  chose  an  intermediate  value  of 
0.05  for  the  B  parameter.  This  value  closely  agrees  with  the 
Weber  fractions  obtained  by  Lunney  (1974),  Getty  (1975), 
Divenyi  and  Danner  (1977),  and  Halpern  and  Darwin 
(1982). 

To  estimate  the  value  for  the  A  parameter  in  the  current 
experiment,  we  substituted  B  =  0.05,  /iw  =  50,  and  a,„ 
=  14.75  ms  in  Eq.  (9)  (recall  that  am  =  14.75  ms  is  the 
value  of  the  internal  noise  obtained  in  experiment  1  at 
—  50  ms).  This  yielded  a  value  for  A  of  12.25  ms.  The  result¬ 
ing  expression  for  crm  was  then  employed  in  Eqs.  ( 6 )  and  ( 7 ) 
for  the  computation  of  d  ’. 

The  prediction  of  the  revised  model  is  plotted  as  the 
solid  line  in  Fig.  5;  although  the  model’s  performance  drops 
with  increasing  gap  size,  the  drop  is  much  less  than  that 
shown  by  the  human  observers  at  80  ms.  Some  part  of  this 
performance  drop  at  long  gap  means  may  be  attributable  to 
the  fact  that  as  the  mean  gap  is  increased,  the  total  duration 
of  the  sequences  becomes  quite  long.  At  mean  gap  durations 
of  19,  50,  and  81  ms,  the  sequence  spans  are  approximately 
0.6, 1,  and  1 .3  s.  An  observer  also  must  hold  the  information 
in  the  first  sequence  over  the  interscquence  interval  of  750 
ms.  It  is  possible  that  spans  approaching  1  s  or  longer  exceed 
the  capacity  of  the  observer’s  auditory  memory,  and  hence 
the  effective  number  of  intervals  being  processed  is  much 
smaller  than  assumed  by  the  model  (see  Watson,  1987). 

Figure  6  shows  the  average  performance  of  the  observ  ¬ 
ers  as  a  function  of  the  numbei  of  imertone  gaps.  Both  the 
number  of  tones  (and  gaps)  and  the  mean  gap  were  manipu¬ 
lated,  in  order  that  the  total  duration  of  the  sequence  span 
would  be  held  constant  at  approximately  0.85  s.  Perfor¬ 
mance  increased  between  7  and  1 1  gaps  and  then  leveled  off. 
The  solid  curve  shows  the  prediction  of  the  revised  model, 
using  Eq.  (9)  and  the  values  of  A  and  B  specified  in  the 
preceding  paragraphs.  The  dashed  curve  is  the  model  predic¬ 
tion  based  on  an  internal  jitter  that  is  independent  of  the 
mean  gap  (set  equal  to  the  prediction  of  the  former  model  at 
n  7).  Both  versions  of  the  model  overpredict  performance 
at  n  equal  to  15. 

V.  GENERAL  DISCUSSION 

I  have  tried  to  show  that  the  discrimination  of  differ¬ 
ences  between  temporally  perturbed  tone  sequences  may  be 
described  as  a  process  in  which  the  listener  computes  the 
correlation  between  the  temporal  envelopes  of  the  se¬ 
quences.  This  computation  appears  to  be  limited  by  an  inter- 
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FIG.  6.  The  average  performance  of  four  observers  ( d ' )  is  plotted  as  a  func¬ 
tion  of  the  number  of  gaps  (average  sequence  duration  is  Saed).  The  solid 
line  is  the  prediction  of  the  correlation  model  revised  to  incorporate  the 
effect  of  mean  gap  (<7„  =  12.25  +  0.05^^ ).  The  dashed  line  is  the  predic¬ 
tion  of  the  correlation  model  with  a  fixed  internal  noise  of 
12.25  +  (0.05X81 )  -  16.3  ms. 


nal  temporal  variability,  or  noise,  in  the  listener’s  encoding 
and  storage  of  the  stimulus  information.  In  this  study,  the 
magnitude  of  the  internal  noise  was  approximately  1 5  ms. 
This  is  about  5-10  ms  higher  than  difference  thresholds  ob¬ 
tained  using  two  interval  duration  discrimination  tasks. 
Consistent  with  the  results  of  other  studies,  the  level  of  the 
internal  noise  was  dependent  on  the  magnitude  of  the  base 
duration  to  be  discriminated.  Performance  was  degraded 
when  the  time  span  of  the  sequences  to  be  compared  was 
longer  than  1  s.  Performance  also  was  degraded  when  the 
listener  was  required  to  compare  sequences  having  more 
than  12  intervals.  These  latter  two  effects  probably  are  relat¬ 
ed  to  limitations  in  memory  capacity  or  to  the  listener's  use 
of  a  temporal  window  that  is  not  uniform  over  the  sequences 

The  idea  that  a  listener  can  compare  auditors  patterns 
by  computing  the  correlation  between  temporal  or  spectral 
aspects  of  the  patterns  is  not  novel.  Many  models  of  the  bin¬ 
aural  detection  mechanism  have  assumed  a  process  that 
computes  the  interaural  correlation  between  the  left  and 
right  auditory  channels  (Durlach.  1963;  Osman.  1971;  Lin- 
demann,  1986;  andcf.  Sorkin,  1965.  and  Pohlmann  and  Sor- 
kin.  1974).  Several  investigators  have  studied  the  binaural 
discrimination  of  changes  in  the  interaural  whole-waveform 
correlation  of  the  signals  (e  g.,  for  wideband  noise.  Pollack 
and  Tnttipoe,  1959;  for  pulse  train  polarity  agreement.  Pol¬ 
lack.  1971;  and  for  wideband,  narrow-band,  and  low-pass 
noise,  Gabriel  and  Colburn.  1981).  These  studies  have  re¬ 
ported  a  dependence  of  discrimination  on  interaural  correla¬ 
tion  that  is  consistent  with  the  hypothesized  correlation  pro¬ 
cess. 

Recently,  Richards  (  1987)  reported  an  experiment  on 
the  discrimination  of  differences  between  simultaneously 
presented  noise  stimuli  having  partially  correlated  ampli¬ 
tude  (and  spectral)  envelopes.  Richards  postulated  acorre- 
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lation  discrimination  process  that  is  essentially  identical  to 
the  one  proposed  in  the  present  study.  Her  noise  stimuli  had 
bandwidths  of  100  Hz  and  center  frequencies  of  2500  and 
2750  Hz.  For  any  given  stimulus,  these  two  noise  bands  had, 
on  average,  a  specified  correlation.  The  observers  had  to  dis¬ 
criminate  which  of  two  such  stimuli  contained  the  higher 
correlation  across  the  spectral  bands.  Richards  tested  her 
observers’  ability  to  discriminate  between  a  reference  stimu¬ 
lus,  containing  either  a  zero  or  unit  noise  correlation,  and 
target  stimuli  having  a  range  of  noise  correlations.  In  gen¬ 
eral,  her  results  supported  the  model:  The  observers’  sensi¬ 
tivity  to  changes  in  envelope  correlation  was  a  monotonic 
function  of  the  computed  Z  statistic  and  was  essentially  in¬ 
dependent  of  the  specific  reference  correlation. 

In  the  binaural  studies  and  in  Richard’s  noise  study,  one 
assumes  that  the  listener  can  compute  the  correlation 
between  the  transduced,  criticai-band-filtered  signals;  the 
signals  are  assumed  to  undergo  minimal  processing  prior  to 
the  correlation  operation.  A  similar  process  could  be  operat¬ 
ing  in  the  present  study:  The  signals  in  each  sequence  are 
transduced,  subjected  to  windowing  and  filtering  opera¬ 
tions,  and  then  stored;  finally,  the  correlation  is  computed 
between  the  resulting  waveforms.  An  alternative,  more  cog¬ 
nitive,  conception  is  that  the  listener  processes  each  se¬ 
quence  so  that  only  the  magnitudes  of  the  time  intervals 
between  tone  onsets  are  encoded  and  stored.  The  listener 
then  computes  the  correlation  between  the  two  lists  of  inter¬ 
onset  times.  This  view  of  the  correlation  process  implies  dif¬ 
ferent  relationships  between  performance  and  the  task  char¬ 
acteristics.  In  contrast  to  the  whole-w  lveform  correlation, 
the  computation  of  correlation  based  on  two  lists  of  stored 
numbers  should  be  less  sensitive  to  certain  transformations 
of  the  sequences  such  as  temporal  compression  or  expansion. 
A  future  experiment  will  examine  this  idea. 

The  listener's  subjective  impression  of  the  present  task, 
is  of  trying  to  recall  and  compare  two  briefly  heard  rhythmic 
patterns.  That  observation,  and  the  relatively  long  interonset 
intervals  employed  in  the  current  experiment,  support  the 
idea  that  the  listener  is  using  a  temporal  rather  than  spectra', 
processing  mode  In  addition,  changing  the  frequency  of  a i  1 
of  the  tones  in  the  second  sequence  has  a  negligible  effect  on 
performance.  Even  so.  we  would  expect  the  simple  correla¬ 
tion  model  to  fail  when  the  sequences  are  composed  of  tones 
of  more  than  a  single  frequency.  Many  studies  of  the  percep¬ 
tion  and  production  of  temporal  patterns  have  demonstrated 
the  influence  of  sequence  temporal  structure  on  spectral  pat¬ 
tern  discrimination  (Deutsch,  1980;  Jones,  1981;  Jones  et 
al.,  1981;  Jones  etal.,  G.,  1982;  and  Monahan,  1987)  as  well 
as  the  influence  of  sequence  spectral  pattern  on  temporal 
pattern  discrimination  (Woods  et  al..  1979;  Handel  and 
Lawson,  1983;  Espinoza-Varas  and  Jamieson.  1984;  Espin- 
oza-Varas  and  Watson,  1986;  and  Sorkin,  1987). 

The  model  of  temporal  jitter  detection  supported  by  the 
Sorkin  et  al.  ( 1982)  study  assumed  that  best  performance 
would  occur  when  the  tones  marking  the  intervals  were 
within  a  critical  band  in  frequency.  In  that  experiment,  the 
detection  of  jitter  in  sequences  containing  different  frequen¬ 
cy  tones  was  predictably  poorer  than  with  equitone  se¬ 
quences.  It  is  possible  that  a  similar  assumption  would  en- 
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able  the  correlation  model  to  describe  pattern  comparisons 
between  multiple-frequency  tone  sequences. 

For  example,  the  listener  might  compute  the  correlation 
between  the  temporal  envelopes  of  tone  subsequences  de¬ 
fined  only  within  a  single  critical  band.  Correlations  com¬ 
puted  within  separate  critical  bands  then  could  be  combined, 
in  order  to  arrive  at  a  composite  estimate  of  the  temporal 
similarity  of  the  sequences. 
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APPENDIX 

The  gap  mean,  standard  deviation,  and  correlation  were 
controlled  by  generating  the  gap  durations  in  the  following 
manner:  Three  independent  normal  deviates,  xa,xk,  and  xe , 
were  generated  and  their  absolute  values  added  to  arrive  at 
random  variables  with  a  correlation  of pa 

x,  *  u|jfa  |  -f-  c\xc  | ,  <A1) 

=  u|jc4j  q-c|jcc| ,  (A2) 

where  u  and  c  are  constants  defined  by 

c=p\:\  «  =  (1-P„),/J.  (A3) 

The  resulting  x,  and  jc,  values  were  limited  to  values  be¬ 
tween  zero  and  2.5  (p  <0.02)  and  then  linearly  transformed 
to  arrive  at  gap  sequences  {(,,}  and  {/,.,}  with  gap  mean 
equal  to^Jip  and  standard  deviation  equal  to  <7|jp.  To  check 
these  procedures,  we  omputed  the  sample  correlation  coef¬ 
ficients  r, ;  and  the  distributions  of  Z  [  Eq.  ( 3 )  ] ;  the  r,  and  r, 
sequences  had  an  average  correlation  equal  top„,  and  the  Z 
distributions  were  approximately  normal. 
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II.  EFFECT  OF  TIME  COMPRESSION  AND  EXPANSION  ON  THE 
DISCRIMINATION  OF  TONAL  PATTERNS. 

ABSTRACT 

This  experiment  tested  how  well  human  listeners  can 
discriminate  between  temporal  patterns  that  are  compressed  or 
expanded  in  time.  The  listener's  task  was  to  determine  whether 
two  arrhythmic,  tonal  sequences  had  the  same  or  different 
temporal  patterns.  According  to  the  Pattern  Correlation  model 
[Sorkin,  £*.  Acoust .  Soc.  Am. .  87 .  (1990)],  listeners  perform  this 
task  by  computing  the  correlation  between  the  pattern  of  time 
intervals  marked  by  the  tones  in  each  sequence.  Listener 
performance  dropped  when  one  of  the  sequences  was  compressed  or 
expanded  in  time.  In  order  for  the  model  to  describe  the 
observed  performance,  it  was  necessary  to  postulate  an  internal 
noise  component  that  was  proportional  to  the  magnitude  of  the 
difference  between  the  sequence  transformations. 

INTRODUCTION 

One  of  the  most  intriguing  features  of  temporal  pattern 
perception  is  the  ability  to  recognize  patterns  as  similar, 
despite  time  compression  or  expansion.  Examples  of  such  time 
normalization  abound  in  speech  and  music  perception  and  we  are 
normally  unaware  of  such  temporal  changes,  even  when  they  occur 
during  relatively  brief  stimuli,  such  as  words.  Our  experiments 
evaluated  the  human  listener's  ability  to  discriminate  between 
word-length  tonal  sequences  that  were  subjected  to  such 
transformations.  We  also  tested  how  well  the  Pattern  Correlation 
model  (Sorkin,  1990)  would  predict  the  effects  of  time 
compression  and  expansion  on  listeners'  performance. 

A  number  of  investigators  have  examined  the  human  listener's 
sensitivity  to  the  temporal  properties  of  non-repeating,  multi- 
tone,  arrhythmic,  sequences.  In  these  experiments,  the  listener 
is  asked  to  detect  a  small  temporal  difference  or  jitter  in  the 
patterns.  Performance  is  a  Weber-like  function  of  the  time 
intervals  between  marker  tones  (Halpern  and  Darwin,  1982;  Hirsch, 
Monahan,  Grant,  and  Singh,  1990;  Lunney,  1974;  Pollack,  1967, 
1968a, b,c;  Schulze,  1989;  and  Sorkin,  1990).  The  Weber  ratio 
(delta-t/T)  typically  varies  from  approximately  5%  to  20%, 
depending  on  the  particular  task  conditions.  Similar  results 
have  been  obtained  in  experiments  using  single  marked  intervals 
(Abel,  1972a, b;  Creelman,  1962;  Getty,  1975;  Divenyi  and  Danger, 
1977;  Divenyi  and  Sachs,  1978;  Espinoza-Varas  and  Jamieson,  j.984; 
and  the  review  by  Allan,  1979) . 

Temporal  pattern  discrimination  also  depends  on  the  number 
of  marked  intervals  (Schulze,  1989;  Sorkin, 1982) ,  the  spectral 
structure  of  the  marker  pattern  (Bregman  and  Campbell,  1971; 
Bregman  and  Dannenbring,  1973;  Espinoza-Varas  and  Watson,  1986; 
Preusser,  1972;  Royer  and  Garner,  1966,  1970;  Sorkin,  1982;  and 
Woods  et  al.,  1979),  the  temporal  structure  of  the  markers 
(Bharucha  and  Pryor,  1986;  Bregman,  1990;  Deutsch,  1980;  Monahan 
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and  Hirsch,  1990;  Jones  et  al.,  1981;  Jones  et  al.,  1982;  Sturges 
and  Martin,  1974;  Monahan,  Kendall,  and  Carterette,  1987),  and 
the  temporal  location  of  the  information  in  the  stimulus  sequence 
(Espinoza-Varas  and  Watson,  1986;  Hirsch,  Monahan,  Grant,  and 
Singh,  1990;  Watson  et  al.,  1975,  Watson  et  al.,  1976). 

Kidd  and  Watson  (1988)  tested  listeners’  ability  to  detect 
frequency  changes  in  5-tone  random  patterns  that  were  exposed  to 
frequency  transpositions  and  or  temporal  expansions.  These 
transformations  involved  multiplying  the  frequency  or  duration  of 
one  of  the  patterns  by  a  constant  factor  between  1.12  and  2. 

They  reported  that  performance  decreased  as  a  function  of  the 
magnitude  of  the  transformation,  with  frequency  transposition 
producing  a  larger  degradation  than  temporal  expansion.  Sorkin 
and  Snow  (1987)  reported  a  similar  experiment  in  which  listeners 
had  to  indicate  whether  two  8 -tone  sequences  of  50-ms  tones  had 
the  same  or  different  frequency  patterns.  All  tone  and  gap 
durations  in  the  second  sequence  were  expanded  or  compressed  by 
up  to  40%.  They  reported  a  drop  in  performance  that  was 
dependent  on  the  magnitude  of  the  time  transformation. 

In  the  present  experiments,  the  listener  is  presented  with 
two,  successively  played,  arrhythmic  tonal  sequences.  The  series 
of  time  intervals  between  the  tone  onsets  in  each  sequence  define 
the  temporal  patterns  to  be  discriminated.  On  half  of  the  trials 
(SAME  trials)  these  two  temporal  patterns  are  identical,  and  on 
half  of  the  trials  (DIFFERENT  trials)  the  patterns  are  different; 
the  listener  must  report  which  condition  exits.  An  important 
experimental  variable  is  the  correlation,  p ,  between  the  two 
series  of  tone  interonset  times,  {X^ )  and  {a2).  On  SAME  trials, 
p  -  1.0,  and  on  DIFFERENT  trials,  pw  is  set  at  a  constant 
value,  less  than  1.0,  that  depends  on  the  particular  condition. 
The  task  is  easiest  when  0,  and  becomes  more  difficult  as 

pex  approaches  unity. 

We  take,  as  a  working  assumption,  that  the  perceived 
difference  between  temporal  patterns  (absent  other  cues,  such  as 
changes  in  amplitude  or  frequency)  is  dependent  on  the  listener's 
estimate  of  the  correlation  between  the  two  series  of  interonset 
times.  The  listener  could  estimate  this  correlation  by  computing 
the  Pearson  product-moment  correlation  coefficient  on  the  lists 
of  (transduced  and  encoded)  interonset  times.  Distortions  of  the 
tonal  sequences,  such  as  temporal  expansion  or  compression, 
should  affect  listener  performance  only  to  the  extent  that  such 
transformations  produce  changes  in  the  estimated  correlation. 

The  ability  to  estimate  the  correlation  between  brief 
temporal  patterns  is  suggested  by  experiments  on  speech 
perception.  Apparently,  listeners  can  calibrate  or  normalize  an 
incoming  speech  signal  for  the  unique  timing-related  properties 
of  the  speaker;  this  process  enables  the  listener  to  modify  the 
interpretation  of  important  phonetic  cues,  such  as  voice  onset 
time  (Miller  and  Liberman,  1979;  Miller  and  Dexter,  1988;  Miller 
and  Grosjean,  1981;  and  see  the  review  by  Miller,  1987).  Miller 
and  Volaitis  (1989)  showed  that  information  about  speaking  rate 
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(syllable  duration)  may  be  used  to  shift  the  category  boundary 
that  defines  whether  consonants  are  identified  as  voiced  or 
voiceless  (i.e.  /b/  vs  /p/) .  The  listener  could  implement  this 
process  by  estimating  and  comparing  phonetic  intervals  in  a 
relative  fashion,  by  computing  ratios  of  critical  intervals  to 
reference  intervals.  These  relative  intervals  would  then  be 
compared  to  the  list  of  relative  intervals  that  characterize  the 
phonetic  prototypes.  This  is  analogous  to  a  correlation 
computation. 

One  aspect  of  our  ordinary  experience  with  musical  rhythm  is 
its  apparent  insensitivity  to  small  changes  in  rate  or  tempo. 

The  importance  of  relative,  rather  them  absolute,  time  in  musical 
perception  was  stressed  by  Handel  (1989) ,  in  a  discussion  of 
factors  determining  the  rhythmic  character  of  auditory  sequences. 
However,  Handel  pointed  out  that  musical  rhythm  may  not  be 
independent  of  tempo,  since  changes  in  tempo  can  produce  changes 
in  the  perceived  dissimilarity  of  melodies.  For  example, 
Gabrielsson  (1973)  performed  a  multidimensional  scaling  analysis 
on  a  number  of  sets  of  different  rhythmic  patterns.  Subjects  had 
to  rate  the  similarity  between  pairs  of  patterns  drawn  from  each 
set.  He  found  that  changes  in  metronomic  tempo  produced  effects 
on  the  subjects'  similarity  space  at  least  as  large  as  those 
produced  by  differences  in  the  meter  or  temporal  pattern  of  the 
stimuli.  Gabrielsson ' s  task  is  quite  different  from  the  current 
sequence  discrimination  tasks.  A  comparable  scaling  task  would 
require  listeners  to  rate  the  similarity  of  rhythmic  patterns 
while  tinder  instructions  to  ignore  differences  in  tempo. 

We  report  on  two  experiments  in  this  paper:  In  the  first 
experiment  we  evaluated  the  effects  of  multiplicative  time 
transformations  to  the  tonal  patterns  to  be  discriminated.  That 
is,  we  test  the  effects  of  multiplying  all  time  intervals  in  both 
sequences,  or  in  the  second  sequence  alone,  by  a  fixed  constant. 
In  the  second  experiment  we  test  the  effects  of  adding  a  fixed 
time  interval  to  all  times  in  both  sequences,  or  to  the  second 
sequence  alone.  We  also  derive  predictions  for  the  behavior  of 
the  Pattern  Correlation  model,  under  the  assumption  of  internal, 
uncorrelated  noise. 

METHOD 

Two  groups  of  subjects  participated  in  the  experiment.  The 
first  group  consisted  of  one  male  and  three  females;  the  second 
consisted  of  two  females  and  two  males.  One  of  the  female 
subjects  in  the  first  group  (MW)  also  served  in  the  second  group. 
All  subjects  were  students  at  the  University  of  Florida.  They 
were  paid  an  hourly  wage  plus  an  incentive  for  correct  responses. 
All  the  subjects  had  normal  hearing  and  performed  the  tasks  for 
approximately  2  h  per  day,  3  days  per  week.  Subjects  were  seated 
in  a  double-walled  acoustically  insulated  chamber.  The  stimuli 
were  presented  monaurally  via  TDH-39  headphones.  The  conditions 
were  run  in  100  trial  blocks;  typically,  8  blocks  were  completed 
in  a  session.  All  independent  variables  (such  as  correlation  and 
magnitude  of  time  transformation)  were  held  constant  within  a 


block  of  trials.  Full  feedback  about  the  correct  response  was 
provided  after  each  trial. 

The  subjects  compaired  pairs  of  tone  sequences  composed  of 
8,  1000  Hz  tone  bursts  presented  at  71-dB  sound-pressure  level. 
The  tone  bursts  were  shaped  by  a  4 -ms  linear  rise  and  decay 
envelope.  An  interval  of  825-s  separated  the  pair  of  tone 
sequences.  After  listening  to  the  pair  of  sequences  on  each 
trial#  the  subject  indicated  whether  or  not  the  temporal  pattern 
of  tones  was  the  same  or  different.  On  a  random  half  of  the 
experimental  trials,  the  temporal  patterns  were  the  same  (SAME 
trials),  that  is,  the  sequence  pattern  correlation 
On  half  of  the  trials  the  patterns  were  different 
trials);  that  is,  p  was  fixed  at  either  0.2,  0.4,  or  0.6.  The 
average  time  interval  between  tone  onsets  varied  from  60-ms  to 
120-ms,  depending  on  the  condition.  The  minimum  interval  between 

tones  (offet  to  onset)  was  2  ms.  The  first  part  of  Figure  1 _ 

illustrates  a  SAME  trial;  the  second  part  illustrates  a  DIFFERENT 
trial. 


T 

(A)  SAtC 


(2)  DIFFERENT 


Figure  1.  The  envelopes  of  typical  tone  sequences  are  shown  for 
same  (a)  and  different  (b)  trials. 
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The  time  intervals  between  tones  were  generated  by  a  process 
that  enabled  control  of  the  mean  and  standard  deviation  of  the 
intertone  intervals.  The  sequences  were  generated  by  combining 
three  independent  normal  random  variables,  X  ,  x,  Xc,  where  a*  = 

»  a*.  The  random  variables  were  combined  to  form  the  two 
sequences  of  interonset  times  (X^  and  (X2)  in  the  following 
manner: 

X,  ■  X#+  Xc  and  X2=*  Xb+  Xc  (1) 

The  variance  of  the  interonset  times,  is 

*  Var[X,]  -  VartXj]  -  Var[X,  +  Xc]  -  o*  +  a*  (2) 

The  correlation  between  the  sequences  is  determined  by  the  ratio 
of  the  variance  common  to  the  two  sequences,  divided  by  the  sum 
of  the  common  and  unique  variances  (see  Section  V  and  also 
Jeffress  and  Robinson,  1962) : 

Pm  (3) 

For  an  ideal  listener,  e.g.  one  having  no  internal  noise,  the 
actual  correlation  on  SAME  trials  would  be  equal  to  1.0  and  on 
DIFFERENT  trials  would  be  equal  to  pM  (see  Section  V) . 

PATTERN  CORRELATION  MODEL 

Sequence  discrimination  can  be  modeled  by  a  process  that 
estimates  the  correlation  between  the  two  series  of  tone 
interonset  times  (Sorkin,  1990) .  The  main  assumption  of  this 
model  is  that  the  listener's  decision  is  based  on  the  Pearson 
product-moment  correlation  computed  on  the  interonset  intervals 
and  that  the  listener's  performance  is  limited  by  internal  noise. 
In  the  study  by  Sorkin  (1990),  the  listener's  performance  was 
specified  by  a  single  parameter:  the  magnitude  of  the  temporal 
jitter  in  the  listener's  encoding  of  the  time  between  tone 
markers.  This  jitter  was  approximately  15-ms  when  the  average 
time  interval  between  tone  onsets  of  85-ms.  Performance  dropped 
when  the  intertone  interval  was  increased,  indicating  that  the 
internal  noise  was  an  increasing  function  of  the  duration  of  the 
intertone  interval. 

One  of  the  goals  of  the  present  experiment  was  to  evaluate 
the  effects  of  uniform  time  expansions  or  compressions  on  the 
performance  of  the  pattern  correlation  model.  Suppose  that  we 
have  two  lists  of  numbers,  {Xj^}  and  (X2),  and  that  we  wish  to 
estimate  the  correlation  between  the  lists,  px1  ,.  Our  estimate 
of  px1  x2  should  not  be  affected  by  multiplying  all  items  in  {Xx} 
by  the  factor  k,  and  all  items  in  (X2)  by  the  factor  k-.  The 
same  would  be  true  if  all  the  {X^ }  were  increased  by  the  additive 
constant,  t, ,  and  all  the  (X2)  were  increased  by  the  additive 
constant,  t2. 

These  predictions  may  change  if  our  estimates  of  the  X^  are 
degraded  by  internal  noise,  because  the  nature  and  magnitude  of 
the  internal  noise  influence  how  accurately  we  can  estimate  px1  x2. 
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For  example,  suppose  that  there  is  a  fixed  internal  noise  that  is 
independent  of  the  magnitude  of  the  intervals  to  be  judged.  An 
expansive  transformation  to  both  sequences,  e.g.  multiplying  all 
the  elements  of  {X^}  and  {X2}  by  the  same  factor,  Jc,  ■  k,  «  k,,, 
where  X^is  greater  than  1,  will  improve  the  accuracy  of  the 
correlation  estimate.  This  is  because  increasing  the  element 
magnitudes,  prior  to  adding  the  internal  noise,  reduces  the 
influence  of  the  internal  noise  on  the  estimate  of  p.  The 
opposite  result  would  obtain  if  k,,  were  less  than  unity.  On  the 
other  hand,  additive  transformations,  such  as  t,  and  tg,  should 
have  no  effect  on  performance,  because  such  transformations  have 
no  effect  on  the  variances  of  the  element  lists. 

The  effects  of  duration  transformations  to  the  first  and 
second  sequences  are  derived  in  Section  V.  The  derivation 
results  in  the  following  equations  which  describe  the  effective 
correlation  between  the  sequences  on  SAME  and  DIFFERENT  trials: 

P*AME  -  W*S+(‘V*«)*3^[*S  +  (4) 

and 

PoiFF  35  P«  PsAME  (5) 


where  cr^  is  the  experimental  variable  defined  in  equation  2,  and 
a,  is  the  internal  noise.  The  additive  constants  t1 ,  tg  have  no 
effect  on  the  correlation. 

We  assume  that  the  listener  estimates  p  t  n  by  computing  the 
Pearson  product-moment  correlation  between  tie  two  patterns  of 
interonset  times.  The  Fisher  r-to-Z  transformation  (see  Section 
V)  yields  a  normal  decision  variable  Z  with  known  mean  and 
standard  deviation.  We  can  predict  the  listener's  performance  by 
taking  the  normalized  difference  between  the  means  of  the  Z 
statistic  on  SAME  and  DIFFERENT  trials: 


.  ,  L  1  1+PsAM  PsAME 

d'=  (n-3P[ - ln( - )+ - 


1-p, 


SAME 


2n-l 


1 ,  ,1+p°'" . 

- ln( - ) 

2 


Pd  iff  ^ 
2n-l 


(6) 


where  n  is  the  number  of  interonset  intervals  and  p^^  and  p0IFF 
are  defined  by  equations  4  and  5.  & 

EXPERIMENT  1.  EFFECT  OF  TIME  COMPRESSION  OR  EXPANSION 

The  purpose  of  this  experiment  was  to  examine  the  effect  of 
uniform  time  compression  or  expansion. of  the  sequences  to  be 
discriminated . 

A.  Procedure 

In  order  to  test  the  effects  of  multiplicative 
transformations  on  discrimination,  listeners  were  run  under  two 
experimental  conditions:  (a)  control  conditions  in  which  both  of 
the  sequences  on  each  trial  received  the  same  multiplicative  time 
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transformation,  k,  =*  kj  =>  k^;  and  (b)  test  conditions,  in  which 
only  the  second  sequence  of  the  pair  on  each  trial  was  compressed 
or  expanded,  k,  »  1.0,  k,  »  k^.  The  test  and  control  conditions 
were  run  under  three  values  of  pattern  correlation;  for  all 
values  of  k,,,  pM  was  set  equal  to  0.2,  0.4,  or  0.6.  The  control 
conditions  were  run  with  equal  to  0.6,  0.8,  1.0,  1.2,  and  1.4, 
and  the  test  conditions  were  run  with  ^  equal  to  0.6,  0.8,  0.9, 
1.1,  1.2,  and  1.4.  The  correlation  and  transformation  levels 
were  fixed  within  each  block  of  trials.  The  nominal  duration  of 
the  tones  was  25-ms  and  the  nominal  duration  of  the  mean 
interonset  interval  (/*I0T)  was  75-ms.  The  nominal  value  of  aK  was 
25-ms.  These  durations  were  scaled  proportionately  by  the  value 
of  the  multiplicative  constant,  k.  At  least  four  blocks  of  100 
trials  were  run  at  each  experimental  condition.  Listeners  ram 
several  thousand  trials  before  data  collection  was  begun;  no 
effect  of  practice  was  evident  on  discrimination  performance. 

The  data  indicated  no  consistent  relationship  between  the 
listeners'  response  criteria  and  the  duration  transformation 
condition. 
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Figure  2.  The  performance  (d1)  of  two  listeners  (MW:  panels  A,C; 
ML:  panels  B, D)  is  plotted  as  a  function  of  the  uniform  time 
expansion  of  both  sequences  (panels  A,B)  or  the  second 
sequence  alone  (panels  C,D).  The  triangle,  circle,  and 
square  symbols  show,  respectively,  the  data  for  the  pM  = 

0.2,  0.4  and  0.6  conditions.  The  brackets  show  plus  ana 
minus  one  standard  error  of  the  mean. 


Performance  (d‘) 


Expansion  Factor 


Figure  3.  The  performance  (d')  of  two  listeners  (SB:  panels 

A,C;  HF:  panels  B, D)  is  plotted  as  a  function  of  the  uniform 
time  expansion  of  both  sequences  (panels  A,B)  or  the  second 
sequence  alone  (panels  C,D).  The  triangle,  circle,  and 
square  symbols  show,  respectively,  the  data  for  the  * 

0.2,  0.4  and  0.6  conditions.  The  brackets  show  plus  and 
minus  one  standard  error  of  the  mean. 
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Performance  (d*) 


Figure  4.  The  average  performance  (d1)  of  four  listeners  is 

plotted  as  a  function  of  the  uniform  time  expansion  of  both 
sequences  (panel  A)  or  the  second  sequence  alone  (panel  B) . 
The  triangle,  circle,  and  square  symbols  show,  respectively, 
the  data  for  the  pM  »  0.2,  0.4  and  0.6  conditions.  The 
brackets  show  the  average  standard  errors  of  the  mean  for 
the  four  listeners.  The  solid  and  dashed  lines  in  panel  A 
are  the  predictions  of  the  correlation  model  fit  to  the 
averaged  data,  assuming  ain=  A  +  where  K-6.1373  and 

B=0.0528  (see  text).  The  solid  and  dashed  lines  in  panel  B 
are  the  predictions  of  the  pattern  correlation  model  fit  to 
the  averaged  data,  assuming  A  +  Bm,ot  +  c|k-l|,  where 
A=9.665,  B=0,  and  C=28.93  (see  text). 
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B.  Results  and  Discussion 

Figures  2,  3  and  4  show  the  effects  on  performance  of 
expanding  or  compressing  the  sequences.  Figures  2  and  3  show  the 
data  for  individual  listeners;  the  vertical  bars  indicate  plus 
and  minus  one  standard  error  of  the  mean.  Figure  4  shows  the 
average  performance  of  the  four  listeners;  the  vertical  bars  are 
the  average  of  the  standard  errors  of  the  four  subjects  in  each 
condition.  The  left-hand  panels  (A,  B)  of  the  figures  show  the 
results  obtained  in  the  control  conditions  (Jt,  *  k,  -  kj  ;  the 
right-hand  panels  (C,  D)  show  the  results  obtained  in  the  test 
conditions  (Jc,  -  1,  kj  -  k^) .  All  conditions  showed  the  predicted 
dependence  on  (S or kin,  1990) . 

When  both  sequences  were  transformed,  performance  increased 
with  the  magnitude  of  the  expansion;  that  is,  performance 
improved  with  expansion  (k^  >  1)  and  decreased  with  compression 
0s.<  1)  .  This  is  consistent  with  the  prediction  of  the  pattern 
correlation  model  with  a  fixed  internal  noise  component: 
expansion  of  the  interonset  duration  increases  the  external 
variance  and  thus  reduces  the  decorrelating  effect  of  internal 
noise;  compression  of  the  interonset  duration  decreases  the 
external  variance  and  increases  the  decorrelating  effect  of 
internal  noise. 

In  order  to  evaluate  the  correlation  model,  we  postulated  an 
internal  noise  having  both  a  constant  noise  component.  A,  and  a 
Weber ' s-law  component ,  B : 

«ln  "  A  +  B^I0T  (7) 

A  least-squares  fit  of  equations  4,  5,  6  and  7  to  the  averaged 
data  from  the  multiplicative  condition  yielded  the -values: 

A-6. 73-ms  and  B=«0 . 053 .  The  Weber  component  contributed  about  37% 
of  the  internal  noise.  The  predictions  of  the  model  are  shown  as 
the  curves  on  the  left-hand  panel  of  figure  4.  Except  for  the 
0.4  condition,  the  fit  is  not  very  good,  however,  the  increase  of 
performance  with  the  magnitude  of  expansion  is  approximated  by 
the  model. 

The  right-hand  panels  of  the  figures  show  that  a  different 
effect  was  produced  by  transforming  only  the  second  sequence  of 
the  pair  (k,  *  1,  k^  =  kH)  .  Under  these  conditions,  performance 
was  a  peaked  function  of  the  absolute  magnitude  of  the  expansion 
or  compression.  Listener  performance  at  compressions  of  0.6  were 
near  to  the  chance  level.  Attempts  to  fit  the  model  (not  shown 
in  figure  4b)  to  this  data  were  not  satisfactory,  so  we 
considered  an  additional  assumption  about  the  nature  of  the 
internal  noise. 

Consider  an  internal  noise  having  a  component  that  depends 
on  the  magnitude  of  the  difference  between  the  transformations  to 
the  patterns: 

A  +  BMI0T  +  ClVkjl  (8) 
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where  A  and  B  are  as  previously  defined,  and  where  C  determines 
the  magnitude  of  the  noise  component  that  is  attributable  to  the 
absolute  difference  between  the  pattern  transformations.  The 
solid  and  dashed  lines  of  figure  4b  were  generated  by  fitting  the 
data  from  the  three  correlation  conditions  to  a  pattern 
correlation  model  that  incorporated  this  noise  assumption.  The 
resulting  parameters  were:  A  *  9. 6 6 -ms,  B  =  0,  and  C  »  28.93. 

The  contribution  of  the  pattern  difference  factor  to  the  total 
internal  noise  ranged  from  zero  at  k=1.0,  to  approximately  60%  at 
3c»»1.5. 

EXPERIMENT  2.  EFFECT  OF  ADDITIVE  TRANSFORMATIONS 

The  purpose  of  this  experiment  was  to  examine  the  effects  of 
uniform  additions  or  reductions  in  the  interonset  durations  of 
the  sequences  to  be  discriminated,  and  to  observe  any  differences 
between  the  effect  of  additive  and  multiplicative  transformation 
of  the  sequences. 

A.  Procedure 

In  order  to  test  the  effects  of  additive  time 
transformations  on  discrimination,  listeners  were  run  under  two 
experimental  conditions:  (a)  control  conditions  in  which  a 
constant  time,  ta,  was  added  to  (or  subtracted  from)  the 
interonset  times  of  both  pair  of  sequences  on  each  trial 
(t^tj*^)  ;  and  (b)  test  conditions,  in  which  the  constant  time, 
ta,  was  added  to  (or  subtracted  from)  the  interonset  intervals 
only  in  the  second  sequence  of  the  pair  on  each  trial  (t^O, 
t-»ta)  .  The  test  and  control  conditions  were  run  under  one  value 
or  pattern  correlation  (pw  -  0.2). 

The  control  conditions  were  run  with  ta  equal  to  -15,  -7,  0, 
7,  15,  30,  and  45-ms,  and  the  test  conditions  were  run  with  ta 
equal  to  -7,  0,  7,  15,  30,  and  45-ms.  These  values  were  chosen 
in  order  to  produce  the  same  interonset  durations  employed  in 
experiment  1,  at  different  values  of  k  .  That  is,  these  times 
produced  expansions  to  the  interonset  times  equivalent  to, 
respectively:  0.8,  0.9,  1.0,  1.1,  1.2,  1.4,  and  1.6,  for  the 
control  conditions  and  0.9,  1.0,  1.1,  1.2,  1.4,  and  1.6,  for  the 
test  conditions.  Because  it  was  impossible  to  have  interonset 
times  smaller  than  2-ms,  we  were  concerned  that  the  truncations 
of  the  interonset  time  required  by  the  use  of  conditions  with 
large  negative  values  of  t  would  distort  the  distributions  of 
{X,}  and  {X2}.  Hence,  conditions  that  would  have  required 
subtracting  a  constant  time  larger  than  7-ms  in  the  test  (or  15- 
ms  in  the  control)  conditions  were  avoided.  In  addition  to  the 
additive  conditions,  the  test  and  control  conditions  of 
experiment  1  were  repeated  with  this  group  of  listeners.  Thus,  a 
total  of  four  transformation  conditions  were  run: 

(1)  Multiplicative  to  both  sequences 

(2)  Multiplicative  to  the  second  sequence 
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Performance  (d*)  '  a  _  Performance^  (d‘) 


(3)  Additive  to  both  sequences 

(4)  Additive  to  the  second  sequence. 


Three  or  four  blocks  of  100  trials  were  run  at  each  experimental 
condition.  As  in  experiment  1,  the  correlation  and 
transformation  levels  were  fixed  within  each  block  of  trials. 


Expansion  Factor  Expansion  Factor 


Figure  5.  The  performance  (d1)  of  two  listeners  (MW:  panels  A,C; 
SD:  panels  B,D)  is  plotted  as  a  function  of  the  time 
expansion  applied  to  both  sequences  (panels  A,B)  or  the 
second  sequence  alone  (panels  C,D).  The  filled  circle 
symbols  (and  solid  lines)  show  the  data  for  conditions  when 
the  sequences  are  expanded  by  adding  a  fixed  time  to  the 
interonset  times.  The  open  square  symbols  (and  dashed 
lines)  show  the  data  for  conditions  when  the  expansion  is 
implemented  by  multiplication  by  a  constant  factor.  The 
brackets  show  plus  and  minus  one  standard  error  of  the  mean 
Listener  SD  did  participate  in  the  multiplicative,  second- 
sequence-alone  condition. 
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Figure  6.  The  performance  (d ' )  of  two  listeners  (CH:  panels  A,C; 
SL:  panels  B,D)  is  plotted  as  a  function  of  the  time 
expansion  applied  to  both  sequences  (panels  A,B)  or  the 
second  sequence  alone  (panels  C,D) .  The  filled  circle 
symbols  (and  solid  lines)  show  the  data  for  conditions  when 
the  sequences  are  expanded  by  adding  a  fixed  time  to  the 
interonset  times.  The  open  square  symbols  (and  dashed 
lines)  show  the  data  for  conditions  when  the  expansion  is 
implemented  by  multiplication  by  a  constant  factor.  The 
brackets  show  plus  and  minus  one  standard  error  of  the  mean. 
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Performance  (d‘) 


Figure  7.  The  average  performance  (d1)  of  four  listeners  is 

plotted  as  a  function  of  the  uniform  time  expansion  of  both 


sequences  (panel  A)  or  the  second  sequence  alone  (panel  B) . 
The  filled  circle  symbols  show  the  data  for  conditions  when 
the  sequence  times  are  expanded  by  adding  a  fixed  time  to 
the  sequence  times.  The  open  square  symbols  show  the  data 
for  conditions  when  the  expansion  is  implemented  by 
multiplication  by  a  constant  factor.  The  brackets  show  the 
average  standard  error  of  the  mean  for  the  four  listeners. 
The  dashed  line  is  the  prediction  of  the  correlation  model, 
using  the  parameters  derived  from  the  data  of  figure  4 


(<7,n-6. 7373+0. 0528|i  T) 
the  correlation  model 


The  solid  line  is  the  prediction  of 
fit  to  the  averaged  additive 


transformation  data  (A=0,  Bs=0.144;  see  text). 
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B.  Results  and  Discussion 


The  left-hand  panels  (A,  B)  of  figures  5  and  6  show  the 
results  obtained  in  the  control  conditions  (multiplying  or  adding 
a  fixed  time  to  both  sequences)  on  the  performance  of  individual 
listeners.  The  right-hand  panels  (C,  D)  show  the  results 
obtained  in  the  test  conditions.  The  dashed  lines  (and  square 
symbols)  show  the  data  from  the  multiplicative  transformation 
conditions  and  the  solid  lines  (and  filled  circle  symbols)  show 
the  data  from  the  additive  conditions.  Figure  7  shows  the  data 
averaged  over  the  four  listeners  (the  data  in  figure  7b  are 
averaged  over  three  listeners) . 

The  data  obtained  in  the  multiplicative  transform  conditions 
replicated  the  results  obtained  in  experiment  1,  for  the  control 
(both-sequences)  and  test  (second-sequence-alone)  conditions.  In 
figure  7a ,  the  square  symbols  are  the  data  points  from  the 
multiplicative  both-sequences  condition.  The  dashed  line  in 
figure  7a  is  the  prediction  of  the  pattern  correlation  model, 
using  the  parameters  obtained  from  the  model  fit  to  the  data  of 
figure  4a  (<rJn  *  6.7373  +  0.0528  M,0T) . 

The  filled  circle  symbols  in  figure  7a  show  the  data 
obtained  when  both  sequences  received  the  additive 
transformation.  The  additive  transformation  produced  an  effect 
markedly  different  from  that  of  the  multiplicative 
transformation:  instead  of  an  increase  in  performance,  there  was 
a  performance  decrease  of  more  than  one  d'  unit.  Adding  a 
constant  time  interval  increases  the  average  interonset  interval 
without  increasing  the  sequence  correlation.  Thus,  if  the 
internal  noise  has  a  Weber' s-law  component,  performance  will 
decrease  under  (positive)  additive  transformations. 

The  solid  line  of  figure  7a  is  a  fit  of  the  model  to  the 
additive  data;  the  parameters  are:  oin  *  0  +  0.144miot»*  in  this 
case,  the  Weber's  law  component  contributed  100%  of  the  internal 
noise.  We  cannot  say  why  the  Weber's  law  contribution  dominates 
the  internal  noise  in  this  case.  Although  the  variance  of  the 
interonset  interval  is  not  changed  by  the  additive 
transformation,  the  mean  time  interval  between  tone  markers 
(offset  to  onset)  and  the  duty  cycle  of  the  tone  marker  (the 
duration  of  the  tone  relative  to  the  interonset  time)  are 
changed.  Apparently,  both  of  these  variables  have  an  effect  on 
the  nature  of  the  internal  noise. 

When  the  additive  transformation  was  applied  to  the  second 
sequence  alone,  discrimination  performance  (for  three  of  the  four 
listeners)  dropped  more  than  2  d'  units.  Since  we  did  not  test 
large  negative  time  intervals,  we  could  not  determine  whether  or 
not  the  performance  function  had  a  maximum  near  k=l.  As  in  the 
multiplicative  case,  performance  decreased  as  a  function  of  the 
magnitude  of  the  additive  transformation.  This  performance  drop 
was  steeper  than  that  produced  when  both  sequences  were 
transformed. 
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GENERAL  DISCUSSION 


When  both  patterns  were  transformed  by  a  multiplicative 
constant,  performance  increased  with  the  amount  of  expansion. 

When  both  patterns  received  an  additive  transformation, 
performance  decreased  with  the  size  of  the  additive  constant. 
These  results  are  generally  consistent  with  a  pattern  correlation 
mechanism  limited  by  internal  noise.  However,  our  attempt  to 
characterize  the  performance  functions  by  a  single  description  of 
the  internal  noise  was  not  successful.  The  contribution  of  the 
Weber's  law  noise  component  was  greater  in  the  additive  case  than 
the  multiplicative  case.  This  difference  may  be  related  to  the 
differences  between  the  two  conditions  in  marker  duty  cycle  and 
mean  time  interval  between  markers. 

When  only  the  second  of  the  two  sequences  was  transformed, 
performance  in  the  multiplicative  condition  was  a  peaked  function 
of  the  magnitude  of  the  transformation.  The  fit  of  the  model  to 
this  data  was  improved  by  assuming  an  internal  noise  component 
proportional  to  the  magnitude  of  the  transformation  difference 
between  the  patterns.  The  existence  of  an  internal  noise 
component  of  this  type  implies  that  there  is  a  processing  cost 
associated  with  certain  differences  between  the  stimuli  to  be 
compared.  Such  costs  have  been  noted  in  temporal  discrimination 
tasks  when  the  interval  markers  have  different  spectral 
properties  (e.g.  Divenyi  and  Danner,  1977;  Hirsch  et  al.  1990; 
Sorkin  et  al.,  1982)  and  in  intensity  discrimination  tasks  when 
the  two  signals  are  of  different  frequency  (Lim  et  al.,  1977). 
Some  conversion  or  normalization  is  required  when  there  are 
differences  between  the  stimuli  that  are  not  relevant  to  the 
particular  pattern  comparison;  there  may  be  internal  noise 
associated  with  the  additional  processing.  It  is  interesting 
that  in  the  present  case,  this  cost  is  approximately  a  symmetric 
function  of  l^-kj. 

An  alternative  explanation  is  that  the  listener's  use  of 
information  from  the  temporal  pattern (s)  is  not  a  uniform 
function  of  the  position  of  the  information  within  the  sequence 
patterns,  as  noted  recently  by  Hirsch  et  al.  (1990).  The 
listener  may  utilize  information  from  certain  regions  of  each 
stimulus  sequence  more  than  from  others.  Temporal 
transformation  of  the  sequences  may  upset  this  temporal  position 
effect,  in  that  regions  of  maximum  attention  in  two 
differentially  transformed  sequences,  no  longer  coincide.  The 
transformation  results  in  a  misalignment  of  the  sequence 
weighting  functions. 

The  current  experiments  indicate  that  the  ability  to 
normalize  time  is  somewhat  limited.  Is  the  observed  sensitivity 
to  time  scaling  inconsistent  with  our  expectations  about  rate 
normalization  in  speech  perception?  This  question  involves  the 
complex  issue  of  the  nature  of  the  rate  normalization  mechanism 
in  speech  (e.g.,  see  Diehl  and  Walsh,  1989;  and  Pisoni  et  al., 
1983)  .  We  restate  two  hypotheses  that  are  relevant  to  the 
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question:  First,  it  is  possible  that  a  listener  can  implement  an 

efficient  time  re-scaling  process  only  for  speech-like  signals. 
That  is,  performance  with  random  tonal  sequences  might  be 
improved  if  the  listener  somehow  could  be  induced  to  process  the 
inputs  as  if  they  were  speech  signals.  The  second  hypothesis  is 
that  the  listener's  use  of  relative  timing  information  is  no  more 
efficient  in  the  sequence  experiments  than  it  is  for  speech 
signals — but  that  the  speech  signal  provides  a  richer  source  of 
time-scaling  information  that  can  be  used  to  augment  the  basic 
timing  data.  Some  of  this  information  is  carried  by  the  higher 
order  struct  lire  of  the  speech  signal. 

Finally,  it  is  tempting  to  try  to  generalize  the  results  of 
the  present  experiments  to  the  case  of  repeated  sequences.  The 
listener's  ability  to  discriminate  between  two  rhythmic  patterns 
may  parallel  the  ability  to  discriminate  between  the  patterns 
played  singly.  However,  caution  is  advised.  Although  the 
pattern  correlation  hypothesis  may  be  related  to  the  perception 
of  rhythmic  stimuli,  the  present  stimuli  are  not  rhythmic  (or 
metric) .  Repetition  is  generally  considered  to  be  a  necessary 
condition  for  rhythmic  perception  (Handel,  1989;  Sturges  and 
Martin,  1974) .  Rhythmic  percepts  are  said  to  emerge  from  the 
acoustic  (and  subjective)  context  of  repetitive  stimuli  and  act 
to  segment  and  organize  stimuli  (Handel,  1989) .  Studies  of 
rhythm  and  meter  generally  have  been  confined  to  temporal 
patterns  that  are  repetitive.  In  our  experiments,  there  was  no 
repetition  of  the  random  temporal  pattern  on  DIFFERENT  trials, 
and  there  was  only  a  single  repetition  of  the  stimulus  pattern  on 
SAME  trials.  An  interesting  question  is  whether  the  present 
results  with  time  transformations  (and  those  reported  previously 
by  Sorkin,  1990)  will  hold  for  repetitive  patterns. 
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^  V  III.  EFFECT  OF  INTERSEQUENCE  DELAY  INTERVAL  ON  THE  DISCRIMINATION 
‘  OF  TONAL  PATTERNS. 


ABSTRACT 

According  to  the  Pattern  Correlation  model  [Sorkin,  J. 
Acoust .  Soc.  Am.  .  22/  (1990)],  listeners  discriminate  between 
arrhythmic  tonal  sequences  by  computing  the  correlation  between 
the  serial  pattern  of  time  intervals  marked  by  the  tones  in  each 
sequence.  The  present  experiments  evaluated  discrimination  when 
the  sequences  were  presented  at  different  frequencies  and  to 
different  ears.  The  sequences  began  at  delayed  starting  times 
and  were  subject  to  random  time  expansions.  When  the  delay 
between  sequence  onsets  was  less  them  10-ms,  discrimination 
appeared  to  be  based  on  comparison  of  the  envelope  of  the 
stimulus  waveforms.  At  longer  time  separations,  however, 
performance  was  consistent  with  the  Pattern  Correlation 
hypothesis . 


INTRODUCTION 

In  Studies  I  and  II  (Sorkin,  1990;  Sorkin  and  Montgomery, 
submitted) ,  we  proposed  a  Pattern  Correlation  model  of  how 
listeners  discriminate  between  the  temporal  patterns  formed  by 
two,  arrythmic  tonal  sequences.  The  primary  assumption  of  the 
model  is  that  a  listener  discriminates  differences  between  the 
two  patterns  by  estimating  the  correlation  between  the  serial 
(temporal)  structure  of  the  patterns.  The  present  experiments 
compared  the  predictions  of  the  pattern  correlation  model  and  an 
alternative,  the  waveform  correlation  model,  when  the  tonal 
sequences  were  presented  at  different  frequencies  and  to 
different  earphone  channels . 

The  basic  experimental  paradigm  is  the  same  as  in  the 
previous  studies.  The  listener  is  presented  with  two, 
successively  played,  arrhythmic  tonal  sequences.  The  series  of 
time  intervals  between  tone  onsets  in  each  sequence  define  the 
two  temporal  patterns  to  be  discriminated.  On  half  of  the  trials 
these  two  patterns  are  identical,  and  on  half  of  the  trials  the 
temporal  patterns  are  different;  the  listener  must  report  whether 
the  patterns  were  the  same  or  were  different.  The  experiment 
variable  is  the  correlation,  p8X,  between  the  sequences  on  trials 
when  the  sequences  are  different;  the  task  is  easiest  when  p^ 
equals  0  and  increases  in  difficulty  as  pw  approaches  one. 

A.  Comparison  of  Pattern  Correlation  Model  and  Waveform 
Correlation  Model 

The  basic  assumption  of  the  Pattern  Correlation  model  is 
that  the  listener  estimates  the  correlation  between  temporal 
patterns  by  computing  the  Pearson  product-moment  correlation 
coefficient,  r12,  on  the  transduced  and  encoded  series  of  marker 
interonset  times.  The  performance  of  the  listener  is  given  by 
equations  A7,  A21  and  A22  in  Section  V  .  Transformations  or 
distortions  of  the  tonal  sequences,  such  as  time  expansions  or 
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compressions,  should  affect  discrimination  to  the  extent  that  the 
transformations  produce  differences  in  the  listener's  estimate  of 
the  correlation. 

Study  II  tested  the  effects  of  constant  additive  and 
multiplicative  trams  format  ions  to  the  sequence  time  scales.  In 
those  experiments  all  tones  were  1000  Hz  and  the  sequences  were 
presented  monaurally,  at  a  time  separation  of  either  750-ms  or 
825-ms.  Performance  decreased  when  one  of  the  sequences  was 
compressed  or  expanded  in  time.  The  decrement  was  a  function  of 
the  magnitude  of  the  discrepancy  in  time  compression  between  the 
two  sequences;  the  amount  of  the  performance  drop  ranged  from  0 
to  2  d'  units  over  a  range  of  compressions  of  from  0.6  to  1.6. 
Adding  an  internal  noise  component  proportional  to  the  absolute 
magnitude  of  the  transformation  difference,  enabled  the  pattern 
correlation  model  to  describe  the  obtained  data. 

The  major  assumption  of  the  pattern  correlation  model  is 
that  the  listener  encodes  and  processes  a  list  of  interonset 
times  from  each  sequence;  the  listener  discards  other  information 
about  the  sequence  waveforms,  such  as  the  absolute  timing  of 
signals  or  the  signals'  spectra.  An  alternative  to  this 
mechanism  is  a  comparison  process  based  on  cross-correlation  of 
the  two  sequence  waveforms  or  their  envelopes.  A  waveform 
correlation  process  cam  provide  a  very  sensitive  measure  of  the 
difference  between  the  waveforms  (or  waveform  envelopes)  of  the 
two  sequences. 

Because  it  involves  a  point- for-point  comparison  of  the 
sequence  waveforms,  a  waveform  correlator  may  be  very  sensitive 
to  time  transformations  such  as  compression  or  expansion.  Such 
transformations  would  result  in  temporal  misalignments  of  the 
patterns.  Temporal  misalignments  that  occur  early  in  the 
sequences  would  produce  even  greater  decorrelations  between  the 
sequences  at  later  times.  As  a  consequence,  the  performance  of  a 
waveform  correlator  may  be  seriously  degraded  by  time 
transformations . 

The  correlator  could  deal  with  random  (but  uniform)  temporal 
transformations  made  to  one  of  the  two  patterns,  by  computing  a 
correlation  function.  For  example,  a  number  of  different 
expansions,  r,  could  be  applied  to  the  first  sequence,  and  then 
each  transformed  sequence  could  be  correlated  with  the  second 
sequence.  Provided  that  the  sequences  were  correlated,  the 
resulting  function  would  yield  a  well-defined  peak  at  the  value 
of  r  that  matched  the  time  transformation  to  the  second  sequence. 

Under  some  conditions,  a  listener  may  use  the  binaural 
system  to  compute  the  cross-correlation  between  two  input 
waveforms  or  their  envelopes.  For  example,  in  many  binaural 
detection  situations,  the  auditory  system  behaves  as  though  it 
computes  a  correlation  between  the  inputs  to  the  respective 
earphone  channels  (Colburn  and  Durlach,  1978) .  In  general,  the 
binaural  mechanism  can  be  used  if  the  sequences  are  presented 
separately  and  almost  simultaneously,  to  the  two  ears.  Under 
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these  conditions,  the  listener  can  make  very  precise 
determinations  of  differences  between  the  sequences.  A  delay  in 
the  second  signal  of  longer  than  about  15  milliseconds  would  be 
expected  to  exceed  the  limits  of  this  system  (Bilsen  and 
Goldstein,  1974) .  In  addition,  there  is  evidence  that  binaural 
comparisons  can  be  performed  when  the  signals  are  at  high  (and 
different)  frequencies  in  the  two  ear  channels  (McFadden  and 
Pasanen,  1974,  1975,  1978). 

Suppose  that  one  of  two  stimulus  sequences  to  be  compared 
has  been  transformed  by  a  uniform  time  compression  or  expansion. 
We  would  expect  that  such  a  transformation  would  produce  a 
percept  similar  to  that  produced  by  stimulating  each  ear  with 
uncorrelated  noise,  e.g.:  the  lower  the  correlation  between  the 
signals,  the  more  spatially  diffuse  will  be  the  percept  (the 
higher  the  correlation,  the  more  spatially  focused) .  Although 
the  effect  of  such  trams  formations  to  one  of  two  binaural  inputs 
has  not  been  tested  directly,  it  seems  clear  that  the  system  will 
not  be  capable  of  forming  a  spatially  focused  percept. 

Basically,  the  system  would  be  presented  with  two  sequences 
composed  of  sinusoid  pulses  of  different  frequency  in  each 
channel.  The  only  time-coherent  aspect  of  this  stimulus  would  be 
the  onset  time  for  the  first  tone  marker.  The  onset  time  for  the 
first  tone  would  be  coherent  whether  or  not  the  temporal  pattern 
was  the  same  or  different  on  a  trial.  Thus,  we  would  not  expect 
the  binaural  comparison  mechanism  to  be  able  to  accomodate 
compressive  or  expansive  trams  formations  to  one  of  the  two 
patterns  to  be  compared;  under  those  conditions,  performance  on 
sequence  comparison  tasks  should  be  adversely  affected. 

B.  Experimental  Plan 

The  present  experiment  combines  conditions  in  which:  (a)  the 
sequences  are  presented  at  different  intersequence  delay 
intervals  (ISIs) ,  and  (b)  the  second  sequence  is  temporally 
compressed  or  expanded.  Three  general  factors  should  affect 
performance  when  the  delay  of  the  sequence  starting  times  or 
intersequence  delay  interval  is  manipulated:  (1)  mechanism,  (2) 
masking,  and  (3)  memory. 

The  putative  effects  of  mechanism  on  performance  at 
different  intersequence  intervals  have  already  been  mentioned. 
Short  intersequence  intervals  (less  than  20-ms)  should  allow 
operation  of  the  binaural  comparison  mechanism;  long  intervals 
will  preclude  operation  of  the  binaural  comparison  mechanism,  but 
still  allow  operation  of  the  pattern  correlation  comparator.  The 
second  factor,  masking,  involves  energetic  masking  (and  related 
interference  effects)  between  two  signals  presented  to  the 
auditory  system  at  short  time  separations.  In  the  current 
experiment,  the  signals  will  be  presented  to  different  earphone 
channels  and  at  different  frequencies  (in  different  critical 
bands) .  This  mode  of  presentation  should  minimize  the  effects  of 
peripheral  sensory  interference  between  the  signals  (see  Sorkin, 
1966) .  Finally,  memory  decrements  related  to  the  storage  of 
information  should  increase  as  the  ISI  increases  (Sorkin,  1982) , 
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but  should  be  minimal  at  ISI's  of  less  than  five  hundred 
milliseconds . 

Experiment  1  tests  whether  sequence  pattern  discriminations 
are  feasible  when  the  sequences  are  presented  to  different 
auditory  channels  and  at  different  frequencies.  Experiment  1 
also  evaluates  the  effects  of  short  and  long  sequence  delays. 
Experiment  2  evaluates  the  interacting  effects  of  intersequence 
delay  and  random  temporal  transformations  on  discrimination 
performance.  Experiment  3  evaluates  the  interacting  effects  of 
temporal  transformation  and  frequency  uncertainty. 

METHOD 

Two  groups  of  subjects  participated  in  these  experiments. 

The  first  group  consisted  of  one  male  and  three  females;  the 
second  consisted  of  two  of  the  original  females  plus  two  new 
female  subjects.  All  subjects  were  undergraduate  students  at  the 
University  of  Florida.  They  were  paid  an  hourly  wage  plus  an 
incentive  for  correct  responses.  Listeners  had  normal  hearing 
and  performed  the  tasks  for  approximately  2  h  per  day,  3  days  per 
week.  Listeners  were  seated  in  a  double-walled  acoustically 
insulated  chamber.  The  stimuli  were  presented  dichotically  via 
TDH-39  headphones.  The  conditions  were  tested  in  blocks  of  100 
trials;  typically,  8  blocks  were  completed  in  a  session.  Except 
in  the  uncertain  duration  conditions  of  experiment  2,  all 
independent  variables  were  held  constant  within  a  block  of 
trials.  Full  feedback  about  the  correct  response  was  provided 
after  each  trial. 

The  sub j  ects  compared  pairs  of  tone  sequences  composed  of  8 
sinusoidal  bursts  of  nominal  durations  of  either  25-ms  (in 
experiment  1)  or  30-ms  (in  experiment  2).  After  listening  to 
each  pair  of  sequences,  the  subject  had  to  indicate  whether  or 
not  the  temporal  pattern  of  tone  and  intertone  intervals  was  the 
same  or  different  for  the  two  sequences.  On  a  random  half  of  the 
experimental  trials,  the  temporal  patterns  were  the  same,  e.g. 

1  • 0 •  0n  half  of  the  trials  the  patterns  were  different, 
e.g.  pex  was  equal  to;  0.2,  0.4,  or  0.6,  for  a  block  of  100 
trials  in  experiment  1  and  0,  0.4,  or  0.8  for  blocks  of  trials  in 
experiment  2  or  3.  The  tone  bursts  in  the  first  sequence  were  at 
1000  Hz  and  approximately  71  dBA  SPL,  and  the  tones  in  the  second 
sequence  were  at  2300  Hz  and  approximately  68  dBA  SPL.  All  tone 
bursts  were  shaped  by  a  4 -ms  linear  rise  and  decay  envelope.  The 
first  sequence  was  always  directed  to  the  left  headphone  and  the 
second  to  the  right.  The  onset  (first  marker  tone)  of  the  second 
sequence  was  presented  at  delays  (ISIs)  of  from  0  to  2.5  seconds, 
relative  to  the  onset  of  the  first  marker  tone  of  the  first 
sequences . 

The  time  intervals  between  tones  were  generated  by  a  process 
that  enabled  experimenter  control  of  the  statistics  of  the 
temporal  pattern:  the  mean  and  standard  deviation  of  the 
intertone  interval,  and  the  correlation,  p  ,  between  the 
patterns.  The  process  is  described  in  Sorkin  (1990)  and  is 
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summarized  in  Section  V.  The  nominal  mean  time  gap  between  tones 
was  50  ms  and  the  nominal  standard  deviation  of  this  gap  was  25- 
ms  or  20-ms;  gap  durations  of  less  than  2  ms  were  not  allowed. 

EXPERIMENT  1.  EFFECT  OF  TWO-CHANNEL  PRESENTATION 

The  purpose  of  the  first  experiment  was  to  examine  how 
sequence  discrimination  depended  on  the  intersequence  delay 
interval  between  the  starting  times  of  the  pair  of  sequences.  In 
addition ,  we  wished  to  extend  the  sequence  discrimination 
paradigm  to  the  case  when  the  sequences  were  presented  at 
different  frequencies  and  to  different  ears. 

A.  Procedure 

In  order  to  compare  sequence  discrimination  performance  at 
short  delays  and  when  the  patterns  overlapped  in  time,  the 
sequences  were  presented  to  different  earphone  channels  and  at 
different  frequencies.  The  goal  was  to  minimize  the  sensory 
interference  between  the  channels  at  short  time  separations  (see 
S or kin,  1965) .  The  beginning  tone  of  the  second  sequence 
occurred  either  0,  10,  20,  50,  300,  635,  725,  875,  or  1375  ms 
after  the  beginning  tone  of  the  first  sequence. 

B.  Results  and  Discussion 

Figure  8  illustrates  the  effect  of  the  delay  and  two -channel 
manipulation  on  performance.  The  four  panels  of  the  figure  show 
the  performance  of  four  subjects;  the  vertical  bars  are  the 
standard  errors  of  the  mean.  The  circles,  squares,  and  triangle 
symbols  show  performance  at  the  pw* 0.2,  0.4,  and  0.6  conditions, 
respectively.  Figure  9  shows  the  average  performance  of  the  four 
subjects;  the  vertical  bars  are  the  average  of  the  standard 
errors  of  the  four  subjects  in  each  condition.  The  individual 
subject  plots  highly  resemble  the  average  data. 

Performance  was  best  at  pex* 0.2,  and  lowest  at  Pw= 0.6,  as  in 
the  previous  study  (Sorkin,  1990) .  Performance  at  an 
intersequence  interval  of  zero  was  high,  decreasing  with 
increasing  intersequence  delays.  Performance  was  quite  poor  at 
ISIs  of  50-ms  and  300-ms  and  at  p#x=0.6.  Performance  increased 
as  the  delay  increased  from  300-ms  to  875-ms. 

The  good  performance  at  short  delays  was  consistent  with  the 
operation  of  either  a  binaural  cross-correlator  or  a  temporal 
pattern  correlator.  Since  the  former  mechanism  is  not  available 
at  long  delays,  performance  at  the  875-ms  (and  longer)  conditions 
is  consistent  with  that  for  the  pattern  correlator  mechanism. 

The  poorest  performance  was  at  delays  of  50-ms  and  300-ms, 
corresponding  to  temporal  overlaps  of  the  two  sequences  of  8%  and 
50%.  To  summarize;  (1)  either  mechanism  can  describe  performance 
at  pattern  overlaps  of  more  than  92%,  (2)  the  pattern  correlator 
can  describe  performance  at  zero  overlaps  (when  the  binaural 
comparator  cannot  operate),  and  (3)  neither  mechanism  is 
effective  at  overlaps  of  between  8  and  50%. 
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Figure  8.  The  performance  (d’)  of  four  listeners  is  plotted  as  a  function  of  the 

Intersequence  Interval,  the  time  interval  between  the  onsets  of  the  first  marker 
tones  in  each  sequence.  The  circle,  square,  and  triangle  symbols  show,  respectively, 
the  data  for  the  p  =  0.2, 0.4  and  O.o  conditions.  The  brackets  show  plus  and 
minus  one  standard  error  of  the  mean. 
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Figure  9.  The  average  performance  (d’)  of  four  listeners  is  plotted  as  a  function  of  the 
Intersequence  Interval,  the  time  interval  between  the  onsets  of  the  first  marker 
tones  in  each  sequence.  The  circle,  square,  and  triangle  symbols  show,  respectively, 
the  data  for  the  p  =  0.2, 0.4  and  0.6  conditions.  The  brackets  show  the  average 
standard  error  of  tne  mean  for  the  four  listeners. 


EXPERIMENT  2.  INTERACTION  OF  INTERSEQUENCE  DELAY  AND  TEMPORAL 
TRANSFORMATION. 

The  purpose  of  this  experiment  was  to  evaluate  the  pattern 
comparison  mechanisms  operating  at  long  and  short  intersequence 
delay  intervals  by  examining  (1)  the  interaction  between  the 
sequence  delay  and  time  transformation  manipulations  at  short 
delays,  and  (2)  the  decrease  in  performance  at  delays  much  longer 
than  in  the  previous  experiment.  The  results  of  experiment  1 
indicated  that  either  comparator  mechanism  could  describe 
performance  at  short  intersequence  delays.  A  temporal 
manipulation  was  added  that  we  believed  would  interfere  with 
operation  of  one  of  the  two  putative  comparison  mechanisms.  The 
manipulation  was  a  random  temporal  transformation  to  the  second 
of  the  two  sequences,  similar  to  that  described  in  Study  I.  This 
transformation  was  a  uniform  compression  or  expansion  of  all  of 
the  times  (marker  tones  and  gaps)  comprising  the  second  sequence. 

A.  Procedure 

Experiment  2  was  similar  to  Experiment  1,  except  that  an 
additional  manipulation  on  the  sequences  was  performed.  This 
manipulation  multiplied  all  time  intervals  in  the  second  sequence 
by  a  constant,  i.e.  all  marker  tone  durations  and  intertone  gaps 
were  expanded  by  a  factor  of  0.8,  0.9,  1.0,  1.1  or  1.2.  This 
factor  was  uniformly  applied  to  all  the  time  intervals  within  a 
single  sequence,  but  could  vary  randomly  over  the  experimental 
trials.  The  manipulation  also  had  the  effect  of  modifying  the 
standard  deviation  of  the  intertone  durations.  The  probability 
of  a  particular  one  of  the  transformations  being  chosen,  was  0.2. 
As  in  Experiment  1,  the  subject  was  required  to  indicate  whether 
the  temporal  pattern  of  tones  was  the  same  or  different,  whether 
or  not  the  overall  tempo  of  the  pattern  had  been  scaled  faster  or 
slower  by  the  time  transformation.  In  Experiment  2  the  beginning 
tone  of  the  second  sequence  occurred  either  10,  350,  900,  or 
2500-ms,  after  the  beginning  tone  of  the  first  sequence  (and 
independent  of  the  temporal  transformation  on  tho  second 
sequence) . 

B.  Results  and  Discussion 

Figure  10  shows  the  effects  of  the  delay  and  expansion 
manipulation  on  performance  for  the  pex=0  conditions.  The  four 
panels  of  the  figure  show  the  performance  of  four  subjects;  the 
vertical  bars  are  the  standard  errors  of  the  mean.  The  circles 
symbols  show  performance  under  no  time  transformation,  and  the 
triangles  show  performance  under  random  time  transformations  of 
the  sequences  patterns.  The  average  data  for  the  four  subjects 
is  shown  in  the  three  panels  of  figure  11.  These  show  the 
averaged  data  for  the  pw®0,  0.4,  and  0.8  conditions, 
respectively.  The  vertical  bars  are  the  average  of  the  standard 
errors  of  the  four  subjects  in  each  condition.  The  individual 
subject  plots  are  highly  similar  to  the  average  data,  and  the 
data  obtained  under  different  values  of  p  are  also  quite 
similar. 
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As  in  experiment  1,  performance  was  best  at  the  lowest 
values  of  pM  and  at  an  ISI  of  350-ms.  Performance  at  an 
intersequence  delay  of  10-ms  was  high,  decreasing  with  increasing 
delays.  Performance  was  quite  poor  at  a  delay  of  350-ms  and 
increased  as  the  delay  increased  to  900-ms  and  then  decreased 
somewhat  at  2500-ms. 

It  is  clear  that  the  addition  of  the  temporal  manipulation 
caused  performance  to  drop  to  the  lowest  levels.  Performance  at 
high  ISIs  however,  was  relatively  unaffected  by  the  manipulation. 
Thus,  it  is  reasonable  to  conclude  that  the  pattern  correlator 
mechanism  is  much  less  sensitive  to  the  time  transformation 
manipulation,  a  result  consistent  with  the  previous  experiment 
(over  the  current  range  of  time  compression).  At  short  ISIs, 
however,  the  effect  of  the  time  transformation  is  large.  The 
results  suggest  that  waveform  correlation  is  the  active  mechanism 
at  short  ISIs  and  that  it  is  sensitive  to  the  temporal 
manipulation.  This  conclusion  is  consistent  with  our 
expectations  about  the  binaural  comparator  and  its  probable 
sensitivity  to  temporal  manipulations  that  disturb  the  coherence 
of  the  patterns  to  be  compared. 
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Figure  10.  The  performance  (d’)  of  four  listeners  is  plotted  as  a  function  of  the 

Intersequence  Interval,  the  time  interval  between  the  onsets  of  the  first  marker 
tones  in  each  sequence,  for  the  p^x  =  0  condition.  The  circle  and  square  symbols 
show,  respectively,  the  data  obtained  in  the  no- transformation  and  time- 
transfonnation  conditions.  The  brackets  show  plus  and  minus  one  standard  error 
of  the  mean. 
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EXPERIMENT  3.  INTERACTION  OF  TEMPORAL  TRANSFORMATIONS  AND 
FREQUENCY  UNCERTAINTY. 

The  purpose  of  this  experiment  was  to  examine  the  possible 
interactions  between  the  effects  of  some  spectral  and  temporal 
manipulations  to  the  stimulus  patterns. 

A.  Procedure 

The  experimental  procedure  was  the  same  as  in  Experiment  2 
except  that  (1)  only  an  ISI  of  900-ms  was  employed,  (2)  the 
intertone  standard  deviation  was  20 -ms.  An  additional  spectral 
manipulation,  was  employed:  in  this  condition  the  frequency  of 
the  sinusoidal  marker  tones  comprising  each  sequence  was  randomly 
varied.  Instead  of  the  sequences  presented  to  the  left  and  right 
earphone  channels  always  being  composed  of  1000  Hz  and  2300  Hz 
marker  tones,  respectively,  the  frequency  of  each  marker  tone  was 
randomly  set.  Although  the  pattern  of  frequencies  forming  each 
sequence  varied  over  trials,  the  particular  random  binary 
sequence  was  repeated  in  both  sequence  within  a  trial.  There 
were  four  experimental  conditions: 

(1)  Conditions  same  as  Experiment  1,  no  time  transformations; 

(2)  Conditions  same  as  Experiment  1,  with  time  transformations; 

(3)  Random  (binary)  pattern  of  marker  frequencies.  No  time 
transformations;  and 

(4)  Random  (binary)  pattern  of  marker  frequencies,  with  time 
transf ormations . 

B.  Results  and  Discussion 

Figure  12  shows  the  average  data  obtained  from  four  subjects 
in  the  experiment;  the  three  panels  show  the  data  for  the  p  * 0, 
0.4,  and  0.8  conditions,  respectively.  The  vertical  bars  snow 
the  average  of  the  standard  errors  of  the  four  subjects.  The  FXD 
condition  represents  the  .fixed  frequency  manipulation  in 
condition  1  and  2,  while  the  RFA  (for  fandom  frequency  across 
trials)  indicates  the  frequency  manipulation  in  conditions  3  and 
4.  There  was  little  or  no  interaction  between  the  spectral  and 
temporal  manipulations. 
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GENERAL  DISCUSSION 


The  results  of  experiment  1  indicated  that  listeners  could 
discriminate  between  the  temporal  patterns,  even  when  the 
patterns  contained  tones  of  different  frequency  and  were 
presented  to  separate  ears.  Performance  was  good  when  the 
sequences  were  presented  either  at  very  short  or  very  long  time 
delays.  The  results  at  very  short  delays  were  consistent  with 
the  predictions  of  both  the  pattern  correlation  model  and  the 
waveform  correlation  model,  since  it  is  unlikely  that  the 
binaural  correlation  mechanism  can  function  when  the 
intersequence  delay  exceeds  20-ms,  .  .  pattern  correlation  is  the 
model  of  choice  for  long  delay  conditions. 

Discrimination  performance  was  poorest  when  the  onset  delay 
exceeded  20-ms  and  the  sequences  overlapped  in  time.  Why  is 
performance  so  poor  when  the  sequences  overlap  in  time?  It  is 
possible  that  the  pattern  correlation  mechanism  can  function 
effectively  only  for  sequentially  presented  stimuli;  sequential 
processing  may  be  a  necessary  condition  for  this  mechanism  to 
operate. 

Experiment  2  added  an  additional  condition  of  temporal 
compression  or  expansion,  enabling  differentiation  of  the  effects 
of  the  intersequence  delay  interval  on  performance.  The  major 
effect  of  this  temporal  manipulation  was  in  the  10-ms  condition, 
where  performance  decreased  greatly.  This  time  manipulation  was 
expected  to  adversely  effect  the  performance  of  the  (binaural) 
waveform  correlator.  Therefore,  when  considered  together  the 
results  support  a  two-phase  mechanism:  when  the  intersequence 
onsets  are  less  than  20-ms,  the  binaural  correlator  is  the  active 
mechanism;  when  the  intersequence  onsets  are  greater  than  20-ms, 
the  pattern  correlator  is  the  active  mechanism. 

In  experiment  3,  an  additional  condition  was  tested  in  which 
the  tone  frequencies  were  randomly  varied  within  each  sequence. 
The  same  pattern  of  tone  frequencies  was  present  in  each  of  the 
pair  of  sequences  on  a  trial.  This  spectral  manipulation 
produced  a  small  drop  in  performance.  The  effects  of  the 
spectral  and  temporal  manipulations  did  not  interact.  That 
is,  the  addition  of  spectral  uncertainty  did  not  potentiate  the 
effect  of  temporal  compression.  This  result  is  consistent  with 
the  hypothesized  pattern  correlation  mechanism,  in  which  the 
listener  extracts  temporal  information  from  the  sequences  and 
discards  the  spectral  information.  However,  a  more  convincing 
demonstration  of  the  independence  of  spectral  and  temporal 
manipulations  would  be  to  randomize  the  tone  frequencies  within, 
as  well  as  across,  trials.  Some  conditions  of  this  type  were 
run,  but  the  subjects  found  this  condition  exceedingly  difficult. 
For  most  of  the  subjects,  performance  was  near  chance;  time 
constraints  prevented  running  a  sufficient  number  of  trials  to 
draw  any  conclusions  from  these  conditions. 
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V.  DERIVATION  OF  THE  PATTERN  CORRELATION  MODEL 
Pattern  Correlation  M9<tel 

The  time  intervals  between  tones  were  generated  by  a  process 
that  enabled  experimental  control  of  the  mean  and  standard 
deviation  of  the  intertone  intervals.  The  intervals  were 
generated  by  combining  the  three  independent,  normal  random 
variables: 

x.'  V  xc'  where  M,  *  b "  0  "  ab  “  al 

The  random  variables  were  combined  to  form  the  two  sequences  of 
interonset  times  {X^}  and  {X2}: 

X,-  Xt+  Xc  and  %  -  X„  +  Xc  (Ala,b) 


where 


E[Xt]  -  E[Xj]  -  ixc  and  VarfX,]  -  VarCX^  -  a*  +  a* 

To  compute  the  correlation  between  the  sequences  {X^  and  {X2}: 


Pxi,X2 

=  [Cov(Xt,X2)]/cx1ax2 

(A2) 

CovCX^Xj) 

-  E[(X1-M1)  (x^)] 

-  E[(X,+XC)  (Xb+Xc)]-MeE[Xc]-MeE[Xc]+MeMc«  a* 

(A3) 

Pxt.x?* 

ac  /  (ai  + 

(A4) 

On  SAME  trials  of  the  experiment,  px1x2  is  set  to  1.0  and 
on  DIFFERENT  trials,  px1  n  is  set  to  p^. 

Suppose  that  the  listener's  response  is  based  on  the  Pearson 
product -moment  correlation,  r12,  computed  on  the  interonset 
intervals.  The  Fisher  r  to  Z  transformation  yields  a  decision 
variable  that  is  approximately  normally  distributed  (Brunk, 

1960)  : 

Z  =  (1/2)  ln[  (l+r12)/  (l-r12)  ]  (A5) 

The  mean  and  standard  deviation  of  Z  are: 

Mz«  (l/2)ln[(l+p)/(l-p)  ]  +  (p)/(2n-l)  and  a2«  (n-3)"3*  (A6a,b) 

Then,  d'  is  given  by  the  difference  between  the  means  of  the  Z 
statistic  on  DIFFERENT  and  SAME  trials,  divided  by  the  standard 
deviation  of  Z: 


i  1  PsaME  ^  ■‘■'f'DIFF 

d'*  [  (n-3 )  * ] - ln( - --)  +  ~== - ln( - ) 

2  1_Psa«  2n_1  2 


1_Po:ff 


Poiff 

2n-l 


(A7) 
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We  postulate  an  internal  uncorrelated  jitter,  ajn,  associated 
with  the  Human  listener's  encoding  and  storage  of  the  interonset 
times.  The  independent,  normal  random  variables  Xlnl  and  Xfn2  are 
added,  respectively,  to  each  sequence; 

X1  -  X,  +  Xc  +  X,n1  and  ^  =  Xb  +  Xc  +  Xln2  (A8a,b) 

where  Xa,  Xe,  Xfn1,  X*,  Xfn2  are  all  pair-wise  independent  and 


“  Mfn2  -  0  and  o\ni  -  a\^  -  a\n 
then,  EfX,]  -  ECXj]  -  mc 
and  VarCX,]  -  VarCX^  +  fffn 

CovfX^Xg)  -  E[X*  ]  “Mg  m  al  (**) 

Px  1.X2  -  ^/CK+  *c+  n.)'(0S+  +  ^f„)  31 3 

*  +  <^c+  °1r>)  (A10) 

let  »  o*  +  <r*  (All) 

then 

Pxi,x2  <A12> 

-  (^/^/ci+c^y^)*]  (A13) 

the  effective  correlations  will  be: 

Pa*  -  V 1 1+ (*„/«.)*  ]  and  p„,„  -  P„/ C 1+  *  1  (A14a, b) 


Thus,  the  internal  noise  tends  to  reduce  the  correlation  between 
the  sequences,  so  that  the  correlation  is  less  than  1.0  on  SAME 
trials  and  less  than  on  DIFFERENT  trials. 

Effect  of  Duration  Transformations 

We  wish  to  determine  the  effect  of  multiplying  each 
sequence,  respectively,  by  the  multiplicative  factors  k,  and  kj. 

That  is,  we  let 

X,  =  ^Xa+  k1Xc+  X,n1  and  X2  =  k^  +  k2Xc  +  XJn2  (A15a,b) 

E(X1]=k1Mc»  E[X2]=k2Mc  (A16a,b) 

Var[X,]  =  k’a'  +  k«aj+  a\n  ,  VarfXj]  *  k +  k|a*  +  a?n  (A17a,b) 
CovCX^Xj)  -  Ei(Xr)^nc)  (X2-X2tie)  ] 

•  k^ECX*  l-^M*  -  (A18) 
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0X1.X2 

-  *i vj  /c<k;a;+k! «:  +«;„)  +<,u 11 1 

(A19) 

-  ^  p„  t*,* + («,,/«„>  ■  ]  ■ ■*  t*s + (*„/«„>  ■  i  ■' 

(A20) 

then 

-  W'i’+O'./o* r'[*5+e», „/«„>* 

(A21) 

p0,m  -  is*?  p«tkj+<«, ,✓««)*  +(«»/««,)  * 

(A22) 

Fallowing  the  sane  arguments,  it  can  be  shown  that  the  addition 
of  a  constant  time  interval  to  either  (or  both)  of  the  sequences 

e.g.  X,  «  X  +  X  +  X|n1  +  t.  ,  has  no  effect  on  or  pD1„. 

Note  that  if  there  is  no  internal  noise,  the  multiplicative 
transformations  have  no  effect  on  performance,  since  then 

Psame  anc*  Pdiff“P«x* 
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